parsing gbk files (antismash result)

parsing gbk files (antismash result)

0

Hello I used antismash from the CLI and I got 700 gbk files (1 gbk file per each analyzed genome).

I used the following script to retrieve the predicted products from the gbk files:

    from Bio import SeqIO
import glob

for files in glob.glob("*.gbk"):
    out_files = "products/"+files.replace(".gbk","_output.tsv")
    cluster_out = open(out_files, "w")


# Extract Cluster info, write to file
    for seq_record in SeqIO.parse(files, "genbank"):
     for seq_feat in seq_record.features:
      if seq_feat.type == "protocluster":
       cluster_number = seq_feat.qualifiers["protocluster_number"][0].replace(" ","_").replace(":","")
       cluster_type = seq_feat.qualifiers["product"][0]

       cluster_out.write("#"+cluster_number+"tCluster Type:"+cluster_type+"n") 

So on this way, from those gbk files I produced “.tsv” files that contain info about the products per each genome.

Here an example:

cat cluster1_bin1.tsv

1 Cluster Type:TfuA-related

1 Cluster Type:terpene

1 Cluster Type:NRPS-like

1 Cluster Type:terpene

1 Cluster Type:terpene

from those “.tsv” files I want to generate a table like this:

table_example

How can I produce that table? I can do this manually buy there are 700 “.tsv” files so I want to know if I can automate that.

Thanks for your time 🙂


awk


biopython


gbk


antismash


bash

• 9 views

Read more here: Source link