Correct Way To Parse A Fasta File In Python
Hi,
I have been wondering at the correct approach in Python, maybe using Biopython, of parsing a fasta file without having to place it in memory (eg: NOT having to read it to a list, dictionary or fasta class) before using it.
The desired result would behave like a generator, as in the pseudo-code example below:
fasta_sequences = fasta_generator(input_file) # The function I miss
with open(output_file) as out_file:
for fasta in fasta_sequences:
name, sequence = fasta
new_sequence = some_function(sequence)
write_fasta(out_file) # Function defined elsewhere
Important aspects are:
- Read sequences one at a time
- Does not put all the sequences into memory
- The approach is safe and well tested
Thanks for your suggestions!
• 127k views
Read more here: Source link