biopython – Parsing a gene bank file and outputting specific feature information to a csv using Bio Python

So I am trying to parse through a gene bank file, extract particular feature information and output that information to a csv file. The example gene bank file looks like this:

SBxxxxxx.LargeContigs.gbk


LOCUS       scaffold_31            38809 bp    DNA              UNK 01-JAN-1980
DEFINITION  scaffold_31.
ACCESSION   scaffold_31
VERSION     scaffold_31
KEYWORDS    .
SOURCE      .
      ORGANISM  .
COMMENT     ##antiSMASH-Data-START##
            Version      :: 6.1.1
            Run date     :: 2022-09-21 11:09:55
            ##antiSMASH-Data-END##
FEATURES             Location/Qualifiers
            protocluster    26198..38809
                            /aStool="rule-based-clusters"
                            /category="terpene"
                            /contig_edge="True"
                            /core_location="[36197:37079](-)"
                            /cutoff="20000"
                            /detection_rule="(Terpene_synth or Terpene_synth_C or
                            phytoene_synt or Lycopene_cycl or terpene_cyclase or NapT7
                            or fung_ggpps or fung_ggpps2 or trichodiene_synth or TRI5)"
                            /neighbourhood="10000"
                            /product="terpene"
                            /protocluster_number="1"
                            /tool="antismash"


  

Now for the output file, I want to create a csv with 3 columns. One column will have the Scaffold information (ie. scaffold_31), the second column will have the category value in the protocluster feature (ie. /category = “terpene”) and the third column will have the product value in the protocluster feature (ie. /product=”terpene”)

This is what I have so far for code. I know I can sort through the feature.qualifiers in the protocluster feature to get the category and product. I am not sure how to extract the scaffold information. I am completely new to parsing through gene bank files so have little knowledge in this domain. Thanks in advance for any assitance!


import Bio
from Bio import SeqIO
import os

input = "/Path to SBxxxxxx.LargeContigs.gbk"
output = open("output.csv", "w")

if not os.path.exists(output):
     for record in SeqIO.parse(input, "genbank")
          for feature in record.features:
              if feature.type == "protocluster" and "category" and "product" in feature.qualifiers:
                  outfile = feature.qualifiers["category"][0] + "," + feature.qualifiers["product"][0] + "\n"
                  output.write(outfile)

```

Read more here: Source link