I’m using UCSC gene tables, and I am running into trouble with interpreting exon frames. In some cases, using the exon frame from the tables creates stop codons, which shouldn’t be happening in coding regions.
As an example, from the hg19 gene NM_001369291 on chromosome 22, I have this line from the gene table:
733 NM_001369291 chr22 + 19466988 19508131 19467079 19506431 19 19466988,19467680,19468475,19470212,19471384,19481849,19483503,19484908,19486623,19492884,19494908,19495288,19496052,19502271,19502487,19504049,19504339,19506366,19508003, 19467094,19467740,19468568,19470350,19471528,19481905,19483552,19484970,19486674,19493004,19495040,19495387,19496214,19502410,19502571,19504168,19504416,19506432,19508131, 0 CDC45 cmpl cmpl 0,0,0,0,0,0,2,0,2,2,2,2,2,2,0,0,2,1,-1,
Where the first list of positions is a list of exon starts, and the last list of numbers is a list of exon frames. 19495288 corresponds to a frame of 2, but using a sequence of the exon from UCSC, only a frame of 1 creates a transcript where no stop codons are made:
>hg19_ncbiRefSeqCurated_NM_001369291.1_22 range=chr22:19495289-19495387 5'pad=0 3'pad=0 strand=+ repeatMasking=none TCTTCCCCTGAAGCAGGTGAAGCAGAAGTTCCAGGCCATGGACATCTCCT TGAAGGAGAATTTGCGGGAAATGATTGAAGAGTCTGCAAATAAATTTGG
Is there something I am missing with interpreting the exon frames of the gene table? Unless I am mistaken, the gene table is 0 indexed, and the fasta entry for the exon is 1 indexed.
Thanks in advance!
Read more here: Source link