Fastest way to perform BLAST search using a multi-FASTA file against a remote database

Fastest way to perform BLAST search using a multi-FASTA file against a remote database

0

I have a multi-FASTA file having ~125 protein sequences. I need to perform a BLASTP seach against remote nr database. I tried using NcbiblastpCommandline, but the issue is that it only accepts files as input. Since my file has a huge number of sequences, I get this error ERROR: An error has occurred on the server, [blastsrv4.REAL]:Error: CPU usage limit was exceeded, resulting in SIGXCPU (24). Storing each sequence from the multi-FASTA file to a separate file at a time works, but then the BLAST search becomes tremendoulsy slow (~10 min/query on an average as opposed to ~1 min/query on the NCBI site).

blastp_results = []
from Bio.Blast.Applications import NcbiblastpCommandline
from Bio import SeqIO
record_iterator = SeqIO.parse("AmpB_DEPs.fasta", "fasta")

for record in record_iterator:
    entry = str(">" + i.description + "n" + i.seq)
    f1 = open("test.txt", "w")
    f1.write(entry)
    f1.close()
    f2 = open("test.txt", "r")
    blastp_cline = NcbiblastpCommandline(query = 'test.txt', db = 'nr -remote', evalue = 
    0.05, outfmt="7 sseqid evalue qcovs pident")
    res = blastp_cline()
    blastp_results.append(res)
    f2.close()

I also tried using NCBIWWW.qblast but it doesn’t seem to provide Query coverage information in the output, something which is important for my study.

Can somebody suggest a way to deal with this issue without compromising on search space or default parameters of BLAST? Suggestions on implementing BLAST in other languages such as PERL, R etc. would also be appreciated.


FASTA


BLAST


python


biopython

• 43 views

Read more here: Source link