Hi Everyone, I am working with a publicly available RNA-Seq dataset for which only the HTSeq-count data is accessible. I have done differential gene expression already (i.e. between sample analysis) however I am also hoping to obtain TPM count for within-sample analysis such as single-sample GSEA and for this I will need gene length. The HTSeq-count data contains raw counts for ~64 ENSG entries (probably transcripts but they are only denoted as ENSG and not ENST). I downloaded transcript length from ENSEMBLE which include UTR and CDS however as you can imagine, without knowing ENST in the HTSeq-count data, I’m not sure how to match up between the multiple ENST and the ENSG in my data. Do you guys have an approach for determining TPM from raw count? Thanks in advance for any help!
Read more here: Source link