Question : Improve genbank feature addition

Question

Improve genbank feature addition

*

60 visibility

0 arrow_circle_up


0
arrow_circle_down


I am trying to add more than 70000 new features to a genbank file using biopython.

I have this code:

from Bio import SeqIO
from Bio.SeqFeature import SeqFeature, FeatureLocation

fi = "myoriginal.gbk"
fo = "mynewfile.gbk"

for result in results:
     start = 0
     end = 0

     result = result.split("t")
     start = int(result[0])
     end = int(result[1])

     for record in SeqIO.parse(original, "gb"):
         record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feat"))
         SeqIO.write(record, fo, "gb")

Results is just a list of lists containing the start and end of each one of the features I need to add to the original gbk file.

This solution is extremely costly for my computer and I do not know how to improve the performance. Any good idea?


Answer – 1 verified

0 arrow_circle_up


0
arrow_circle_down

You should parse the genbank file just once. Omitting what results contains (I do not know exactly, because there are some missing pieces of code in your example), I would guess something like this would improve performance, modifying your code:

fi = "myoriginal.gbk"
fo = "mynewfile.gbk"

original_records = list(SeqIO.parse(fi, "gb"))

for result in results:
    result = result.split("t")
    start = int(result[0])
    end = int(result[1])

    for record in original_records:
        record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feat"))
        SeqIO.write(record, fo, "gb")


Read more here: Source link