Thanks for contributing an answer to Stack Overflow!,Connect and share knowledge within a single location that is structured and easy to search.,This script assumes a proper fasta file. It will remove all “.seq” strings at the end of any line. And in a proper fasta file, only the ID lines should contain this characters.,but i have also tried this and get the same error:
Your error occurs because SeqIO.write
accepts a SeqRecord
or a list/iterator of SeqRecord
s but you are feeding it just a list like [name, sequence]
. Instead I suggest you just modify the SeqRecord
.id
and .description
(note, if there is whitepace in the header line, you’ll need to handle this too). Also it is most efficient (across Biopython versions) to write all the records at once, rather than calling .write
each iteration:
from Bio
import SeqIO
def yield_records():
with open('lots_of_fasta_in_file.fasta') as f:
for seq_record in SeqIO.parse(f, 'fasta'):
seq_record.id = seq_record.description = seq_record.id.replace('.seq', '')
yield seq_record
SeqIO.write(yield_records(), 'new.fasta', 'fasta')
load more v
Note: filtered.fasta will only have the last s_record in lineageV_paralog_warning_genes.fasta that is found in paralogs_in_all because filtered.fasta will be overwritten during the loop.,
Login before adding your answer.,I guess you need to write s_record instead of parsed sequence i.e. desired_proteins.
Something like this?,You’re passing the wrong object to SeqIO.write: it expects a SeqRecord, and instead it gets a Seq object. documentation
I am trying to filter out sequences using SeqIO but I am getting this error.
Traceback (most recent call last):
File "paralog_warning_filter.py", line 61, in <module>
.
.
.
SeqIO.write(desired_proteins, "filtered.fasta","fasta")
AttributeError: 'str' object has no attribute 'id'
Here is the relevant part of the script I am trying:
fh = open('lineageV_paralog_warning_genes.fasta')
for s_record in SeqIO.parse(fh, 'fasta'):
name = s_record.id
seq = s_record.seq
for i in paralogs_in_all:
if name.endswith(i):
desired_proteins = seq
output_file = SeqIO.write(desired_proteins, "filtered.fasta", "fasta")
output_file
fh.close()
load more v
I want to estimate protein aromaticity. I have a fasta file of protein sequences. I am using the following code but getting an error – (AttributeError: ‘str’ object has no attribute ‘aromaticity’). Could you please provide any suggestion? Thanks,@rthapa26 Did this solve your problem? Or do you have other questions?,I strongly suggest investing some time in reading the Biopython tutorial.,Edited by @MarkusPiotrowski for code formatting
from Bio import SeqIO from Bio.SeqUtils.ProtParam import ProteinAnalysis import sys from Bio.SeqUtils import ProtParamData # Local from Bio.SeqUtils import IsoelectricPoint # Local from Bio.Seq import Seq from Bio.Alphabet import IUPAC from Bio.Data import IUPACData from Bio.SeqUtils import molecular_weight from Bio import SeqIO from CAI import CAI from Bio.Seq import Seq import os out = open('output', 'a') ss = [] for s in open('protein.fa'): if s.startswith('>'): continue ss.append(s.strip()) print(ss) for i in ss: print(i.strip().aromaticity())
load more v
It’s a bug in Biopython 1.78.,What those three sequences have in common is that they’re less than 61 characters long. When you print the record it implicitly calls __str__ for the record, which calls __repr__ for the Seq. DBSeq gets its method from Seq, which looks like:,biopython is version 1.78,…and then it only tries to access the nonexistent _data for the short sequences.
What those three sequences have in common is that they’re less than 61 characters long. When you print the record it implicitly calls __str__
for the record, which calls __repr__
for the Seq. DBSeq gets its method from Seq, which looks like:
def __repr__(self):
""
"Return (truncated) representation of the sequence for debugging."
""
if len(self) > 60:
# Shows the last three letters as it is often useful to see
if
# there is a stop codon at the end of a sequence.
# Note total length is 54 + 3 + 3 = 60
return f "{self.__class__.__name__}('{str(self[:54])}...{str(self[-3:])}')"
else :
return f "{self.__class__.__name__}({self._data!r})"
Other “using-object” queries related to “AttributeError: ‘str’ object has no attribute ‘id’ using BioPython, parsing fasta”
Read more here: Source link