I am building a program in order to predict peptide sequences from DNA data. I am still in an early phase and making a test i ran into the next problem:
1.- I Downloaded from the NCBI a gene sequence along with its ‘computationaly predicted’ correspondent protein sequence.
2.- Then in my program, in the very first phase, I translated the DNA fasta file into its correspondent 6 reading possibe reading frames so as to explore which one is open. I identified the longest ORF in the first forward RF, and found this longest peptide to be present in the protein sequence I originally downloaded. Good.
3.- The first few aminoacid sequence from this protein sequence downloaded from the NCBI is not present in the translated first fwd ORF which is the open one, but it is actually found in the third fwd ORF. Therefore, I am facing a protein sequence from NCBI which is the result of merging exons from two different reading frames.
I know that this makes no biological sense, as far as I know, so my question/discussion point, is if it is biologically possible for a cell to come up with such peptide sequence, or should I just regard this part of the predicted protein as wrong.
Thank you for your attention and Best regards.
Read more here: Source link