Average Amino Acid Identity (AAI) analysis manually
Hi all,
I need to perform Average Amino Acid Identity (AAI) analysis for 422 genome using the SLURM system that only allows jobs to run for 3 days. Tool like compareM can’t finish the job on time. Therefore I wish to run the analysis using parallel, awk or sed command.
However, I don’t really understand how this analysis is working, basically they perform BLAST from the query genome against the reference genome with cut-offs of at least 30% identity and at least 70% coverage. Then they took the top match and performed the reverse search using BLAST with the same cut-offs.
I was previously running an similar analysis called percentage of conserved protein using script like below:
cat allpairs.txt | parallel --colsep ' ' -j 32 blastp -query {1} -subject {2} -evalue 0.00001 -qcov_hsp_perc 50 -outfmt 6 -max_target_seqs 1 -out {1}_{2}.tsv
which I first save a file contains all the pairs of genome I want to BLAST (allpairs.txt) and perform BLAST using parallel command.
But I don’t understand how to perform the reverse search using BLAST with the same cut-offs, is it possible to do it using parallel, awk or sed?
Thank you very much.
Best regards,
Felix
• 132 views
Read more here: Source link