parsing gbk files (antismash result)
Hello I used antismash from the CLI and I got 700 gbk files (1 gbk file per each analyzed genome).
I used the following script to retrieve the predicted products from the gbk files:
from Bio import SeqIO
import glob
for files in glob.glob("*.gbk"):
out_files = "products/"+files.replace(".gbk","_output.tsv")
cluster_out = open(out_files, "w")
# Extract Cluster info, write to file
for seq_record in SeqIO.parse(files, "genbank"):
for seq_feat in seq_record.features:
if seq_feat.type == "protocluster":
cluster_number = seq_feat.qualifiers["protocluster_number"][0].replace(" ","_").replace(":","")
cluster_type = seq_feat.qualifiers["product"][0]
cluster_out.write("#"+cluster_number+"tCluster Type:"+cluster_type+"n")
So on this way, from those gbk files I produced “.tsv” files that contain info about the products per each genome.
Here an example:
cat cluster1_bin1.tsv
1 Cluster Type:TfuA-related
1 Cluster Type:terpene
1 Cluster Type:NRPS-like
1 Cluster Type:terpene
1 Cluster Type:terpene
from those “.tsv” files I want to generate a table like this:
How can I produce that table? I can do this manually buy there are 700 “.tsv” files so I want to know if I can automate that.
Thanks for your time 🙂
• 9 views
Read more here: Source link