Correct Way To Parse A Fasta File In Python

Correct Way To Parse A Fasta File In Python

8

Hi,

I have been wondering at the correct approach in Python, maybe using Biopython, of parsing a fasta file without having to place it in memory (eg: NOT having to read it to a list, dictionary or fasta class) before using it.

The desired result would behave like a generator, as in the pseudo-code example below:

fasta_sequences = fasta_generator(input_file) # The function I miss
with open(output_file) as out_file:
    for fasta in fasta_sequences:
        name, sequence = fasta
        new_sequence = some_function(sequence)
        write_fasta(out_file) # Function defined elsewhere

Important aspects are:

  • Read sequences one at a time
  • Does not put all the sequences into memory
  • The approach is safe and well tested

Thanks for your suggestions!


fasta


parsing


python

• 127k views

Read more here: Source link