Parsing transcript version in Ensembl mouse annotation
I aligned some data to a Ensembl transcriptome with novel transcripts. I am trying to lift over the sites from transcriptome to genome, which I have previously done using the R package genomicRanges.
The Ensembl FASTA headers look like this and contain a transcript name (e.g. ENSMUST00000178537.2):
>ENSMUST00000178537.2 cdna chromosome:GRCm39:6:41510135:41510146:1 gene:ENSMUSG00000095668.2 ...
However, in the actual transcriptome GTF from Ensembl, the transcriptome names look like this:
> ... transcript_id "ENSMUST00000178537"; transcript_version "2"; ...
So the transcript name is divded between two fields; the actual transcript number (suffix) is encoded in the “transcript_version” column.
Is there any tool or command which can append the transcript version to the transcript ID? I guess I could do it in Excel but it would be less reproducible.
• 29 views
Read more here: Source link