AttributeError: ‘str’ object has no attribute ‘id’ using BioPython, parsing fasta

Thanks for contributing an answer to Stack Overflow!,Connect and share knowledge within a single location that is structured and easy to search.,This script assumes a proper fasta file. It will remove all “.seq” strings at the end of any line. And in a proper fasta file, only the ID lines should contain this characters.,but i have also tried this and get the same error:

Your error occurs because SeqIO.write accepts a SeqRecord or a list/iterator of SeqRecords but you are feeding it just a list like [name, sequence]. Instead I suggest you just modify the SeqRecord .id and .description (note, if there is whitepace in the header line, you’ll need to handle this too). Also it is most efficient (across Biopython versions) to write all the records at once, rather than calling .write each iteration:

from Bio
import SeqIO

def yield_records():
   with open('lots_of_fasta_in_file.fasta') as f:
   for seq_record in SeqIO.parse(f, 'fasta'):
   seq_record.id = seq_record.description = seq_record.id.replace('.seq', '')
yield seq_record

SeqIO.write(yield_records(), 'new.fasta', 'fasta')

load more v

Note: filtered.fasta will only have the last s_record in lineageV_paralog_warning_genes.fasta that is found in paralogs_in_all because filtered.fasta will be overwritten during the loop.,
Login before adding your answer.,I guess you need to write s_record instead of parsed sequence i.e. desired_proteins.
Something like this?,You’re passing the wrong object to SeqIO.write: it expects a SeqRecord, and instead it gets a Seq object. documentation

I am trying to filter out sequences using SeqIO but I am getting this error.

Traceback (most recent call last):
File "paralog_warning_filter.py", line 61, in <module>
   .
   .
   .
   SeqIO.write(desired_proteins, "filtered.fasta","fasta")
   AttributeError: 'str' object has no attribute 'id'

Here is the relevant part of the script I am trying:

fh = open('lineageV_paralog_warning_genes.fasta')
for s_record in SeqIO.parse(fh, 'fasta'):
   name = s_record.id
seq = s_record.seq
for i in paralogs_in_all:
   if name.endswith(i):
   desired_proteins = seq
output_file = SeqIO.write(desired_proteins, "filtered.fasta", "fasta")
output_file
fh.close()

load more v

I want to estimate protein aromaticity. I have a fasta file of protein sequences. I am using the following code but getting an error – (AttributeError: ‘str’ object has no attribute ‘aromaticity’). Could you please provide any suggestion? Thanks,@rthapa26 Did this solve your problem? Or do you have other questions?,I strongly suggest investing some time in reading the Biopython tutorial.,Edited by @MarkusPiotrowski for code formatting

from Bio
import SeqIO
from Bio.SeqUtils.ProtParam
import ProteinAnalysis
import sys
from Bio.SeqUtils
import ProtParamData # Local
from Bio.SeqUtils
import IsoelectricPoint # Local
from Bio.Seq
import Seq
from Bio.Alphabet
import IUPAC
from Bio.Data
import IUPACData
from Bio.SeqUtils
import molecular_weight
from Bio
import SeqIO
from CAI
import CAI
from Bio.Seq
import Seq
import os
out = open('output', 'a')
ss = []

for s in open('protein.fa'):
   if s.startswith('>'): continue
ss.append(s.strip())
print(ss)
for i in ss: print(i.strip().aromaticity())

load more v

It’s a bug in Biopython 1.78.,What those three sequences have in common is that they’re less than 61 characters long. When you print the record it implicitly calls __str__ for the record, which calls __repr__ for the Seq. DBSeq gets its method from Seq, which looks like:,biopython is version 1.78,…and then it only tries to access the nonexistent _data for the short sequences.

What those three sequences have in common is that they’re less than 61 characters long. When you print the record it implicitly calls __str__ for the record, which calls __repr__ for the Seq. DBSeq gets its method from Seq, which looks like:

    def __repr__(self):
       ""
    "Return (truncated) representation of the sequence for debugging."
    ""
    if len(self) > 60:
       # Shows the last three letters as it is often useful to see
    if
    # there is a stop codon at the end of a sequence.
    # Note total length is 54 + 3 + 3 = 60
    return f "{self.__class__.__name__}('{str(self[:54])}...{str(self[-3:])}')"
    else :
       return f "{self.__class__.__name__}({self._data!r})"

Other “using-object” queries related to “AttributeError: ‘str’ object has no attribute ‘id’ using BioPython, parsing fasta”

Read more here: Source link