Parsing transcript version in Ensembl mouse annotation

Parsing transcript version in Ensembl mouse annotation

1

Hi all,

I aligned some data to a Ensembl transcriptome with novel transcripts. I am trying to lift over the sites from transcriptome to genome, which I have previously done using the R package genomicRanges.

The Ensembl FASTA headers look like this and contain a transcript name (e.g. ENSMUST00000178537.2):

>ENSMUST00000178537.2 cdna chromosome:GRCm39:6:41510135:41510146:1 gene:ENSMUSG00000095668.2 ...

However, in the actual transcriptome GTF from Ensembl, the transcriptome names look like this:

> ... transcript_id "ENSMUST00000178537"; transcript_version "2"; ...

So the transcript name is divded between two fields; the actual transcript number (suffix) is encoded in the “transcript_version” column.

Is there any tool or command which can append the transcript version to the transcript ID? I guess I could do it in Excel but it would be less reproducible.


Transcriptome


GTF


annotation


Genome


Ensembl

• 29 views

Read more here: Source link