Extracting organism and seq from fasta

Extracting organism and seq from fasta

0

Hi,

I am trying to extract sequences from a fasta file from a database with a specific organism species keyword from a .txt file containing the relevant headers. Do you know how I can do this in python as the biopython guide I’ve looked at basically said “you’re screwed if your files aren’t .gb”.

The header looks like this:

>VFG000361(gb|WP_000982866) (ybtU) yersiniabactin biosynthetic protein YbtU [Yersiniabactin (VF0136) - Nutritional/Metabolic factor (VFC0272)] [Yersinia pestis CO92] 

my current script only splits up the header by word in header but this returns different keywords as opposed to just the organism name each time by which I can filter the fasta.

all_species = []
for seq_record in SeqIO.parse("InputFile.fas", "fasta"):
    all_species.append(seq_record.description.split()[8])
print(all_species)


mining


python


parsing


fasta


biopython

• 155 views

Read more here: Source link