Getting premature stop codons from exonerate output

Getting premature stop codons from exonerate output

0

Hello,

I made alignments with exonerate’s protein2genome model, and I would like to count the number of stop codons in the alignment. Unfortunately the Vulgar output doesn’t record stop codons. So then I thought I might just extract the aligned protein sequence from BOTH query and target, and then count the ‘*’s and ‘#’s. However, the –ryo option always seems to deliver the DNA sequence for the target (the genomic input). I can’t simply translate this; it’s a mess when I try. (I think it contains frameshifts and split codons.)

Next I tried using Biopython’s SearchIO package. I see that it inserts ‘X’s where the split codons are, which is fine, but eventually if there are too many frameshifts or split codons, it just gives up on the sequence and returns all ‘X’s, even though those regions were still alignable with exonerate. I’m trying to pick up pseudogenes so the alignment is far from perfect, but there’s definitely a lot of information here that SearchIO is just discarding.

Does anyone know a good way to get premature stop codons from exonerate’s protein2genome model??

Here’s the bottom of my exonerate alignment, just to show that there is still alignable stuff there:

Here's the bottom of my exonerate alignment, just to show that there is still alignable stuff there

And here’s what SearchIO extracts. There are 11 exons in the exonerate alignment, but the parser give up on the last 5:

And here's what SearchIO extracts


exonerate


biopython

• 11 views

Read more here: Source link