Fastest way to perform BLAST search using a multi-FASTA file against a remote database
I have a multi-FASTA file having ~125 protein sequences. I need to perform a BLASTP seach against remote nr
database. I tried using NcbiblastpCommandline
, but the issue is that it only accepts files as input. Since my file has a huge number of sequences, I get this error ERROR: An error has occurred on the server, [blastsrv4.REAL]:Error: CPU usage limit was exceeded, resulting in SIGXCPU (24)
. Storing each sequence from the multi-FASTA file to a separate file at a time works, but then the BLAST search becomes tremendoulsy slow (~10 min/query on an average as opposed to ~1 min/query on the NCBI site).
blastp_results = []
from Bio.Blast.Applications import NcbiblastpCommandline
from Bio import SeqIO
record_iterator = SeqIO.parse("AmpB_DEPs.fasta", "fasta")
for record in record_iterator:
entry = str(">" + i.description + "n" + i.seq)
f1 = open("test.txt", "w")
f1.write(entry)
f1.close()
f2 = open("test.txt", "r")
blastp_cline = NcbiblastpCommandline(query = 'test.txt', db = 'nr -remote', evalue =
0.05, outfmt="7 sseqid evalue qcovs pident")
res = blastp_cline()
blastp_results.append(res)
f2.close()
I also tried using NCBIWWW.qblast
but it doesn’t seem to provide Query coverage
information in the output, something which is important for my study.
Can somebody suggest a way to deal with this issue without compromising on search space or default parameters of BLAST? Suggestions on implementing BLAST in other languages such as PERL, R etc. would also be appreciated.
• 43 views
Read more here: Source link