Fasta file reading python

Answer by Aidan Golden

I think you can just use Biopython,It is indeed wrong today. I edited the answer since it has been possible to use str(sequence) for a long time now.,Very useful answer from 7 years ago! FYI, in current version of biopython(1.69), fasta.seq.tostring() is obsolete, use str(fasta.seq) instead.,Nicely done, thanks for a great example on how to use groupby! I’ll try to include it in my code, although I’ll probably keep the Biopython fasta parser for now 🙂

The desired result would behave like a generator, as in the pseudo-code example below:

fasta_sequences = fasta_generator(input_file) # The function I miss
with open(output_file) as out_file:
    for fasta in fasta_sequences:
        name, sequence = fasta
        new_sequence = some_function(sequence)
        write_fasta(out_file) # Function defined elsewhere

Source: www.biostars.org/p/710/


Answer by Ahmir Chase

Note that the inclusion of Bio.SeqIO (and
Bio.AlignIO) in Biopython does lead to some
duplication or choice in how to deal with some file formats. For
example, Bio.Nexus will also read sequences from Nexus files – but
Bio.Nexus can also do much more, for example reading any phylogenetic
trees in a Nexus file.,Python novices might find Peter’s introductory Biopython
Workshop useful which
start with working with sequence files using SeqIO.,However, as explained in the output section, for non-sequential file
formats like Clustal Bio.SeqIO.write() is forced to automatically
turn the iterator into a list, so this advantage is lost.,The design was partly inspired by the simplicity of BioPerl’s
SeqIO. In the long term we hope to match
BioPerl’s impressive list of supported sequence file
formats and multiple alignment
formats.

from Bio import SeqIO

for record in SeqIO.parse("example.fasta", "fasta"):
    print(record.id)

Source: biopython.org/wiki/SeqIO


Answer by Mckinley Owens

line is the string you have already read, there is no next() method on it. ,The error is likely coming from the line:,Part of the problem is that you’re trying to mix two different ways of reading the file – you are iterating over the lines using for line in f1 and <handle>.next().,line is simply a str, so you can’t call str.next()

The error is likely coming from the line:

nextline=line.next()

Source: stackoverflow.com/questions/20580657/how-to-read-a-fasta-file-in-python


Answer by Jayceon Porter

Read a sequence in FASTA format and print only the header of the sequence,Read a sequence in FASTA format from the file SingleSeq.fasta and print only the header of the sequence,Read a file in FASTA format and write to a new file only the sequence (without the header).,Read a file in FASTA format and write to a new file only the header of the record.

>sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRWRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFYLKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFYEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGDAGEGEN

>sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRWRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFYLKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFYEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGDAGEGEN

Source: gtpb.github.io/PPB17/day2/3-Parsing/Parsing-Theory-I.html


Read more here: Source link