How can I print and write the strain /isolate/voucher number of a SeqRecord objec in biopython?

The isolate is a qualifier of the source feature that you can access like so:

from Bio import SeqIO
from pprint import pprint

# Read genbank file
for rec in SeqIO.parse("genome.gb", "genbank"):
    source = rec.features[0]
    pprint(source.qualifiers)

will print:

OrderedDict([('organism', ['Amauroderma calcitum']),
             ('mol_type', ['genomic DNA']),
             ('isolate', ['FLOR 50931']),
             ('db_xref', ['taxon:1774182']),
             ('country', ['Brazil']),
             ('collection_date', ['07-Jan-2013']),
             ('collected_by', ['D.H. Costa-Rezende'])])

alternatively, IMHO, the simplest way to get this info is with bio

bio fetch KU315207 | bio json > data.json

now you have a json file that can be immediately materialized in your program, no GenBank parsing needed anymore

Read more here: Source link