Average Amino Acid Identity (AAI) analysis manually

Average Amino Acid Identity (AAI) analysis manually

1

Hi all,

I need to perform Average Amino Acid Identity (AAI) analysis for 422 genome using the SLURM system that only allows jobs to run for 3 days. Tool like compareM can’t finish the job on time. Therefore I wish to run the analysis using parallel, awk or sed command.

However, I don’t really understand how this analysis is working, basically they perform BLAST from the query genome against the reference genome with cut-offs of at least 30% identity and at least 70% coverage. Then they took the top match and performed the reverse search using BLAST with the same cut-offs.

I was previously running an similar analysis called percentage of conserved protein using script like below:

cat allpairs.txt | parallel --colsep ' ' -j 32  blastp -query {1} -subject {2} -evalue 0.00001 -qcov_hsp_perc 50 -outfmt 6 -max_target_seqs 1 -out {1}_{2}.tsv

which I first save a file contains all the pairs of genome I want to BLAST (allpairs.txt) and perform BLAST using parallel command.

But I don’t understand how to perform the reverse search using BLAST with the same cut-offs, is it possible to do it using parallel, awk or sed?

Thank you very much.

Best regards,

Felix


awk


parallel


sed


AAI


BLAST

• 132 views

Read more here: Source link