geneiD-genetranscript annotations


Trying to generate a frame with 2 columns: transcript_id and gene_id, in LINUX
(gtf from esembl)

grep -P -o ‘ESNCAGd{11} Equus_caballus.EquCab3.0.104.gtf’ > ensecag.txt

grep -P -o ‘ESNCATd{11} Equus_caballus.EquCab3.0.104.gtf’ > ensecat.txt

wc -l enseca* # To see if both files have the same length

They are not the same length: my geneid is bigger than my gene transcript. Could it be possible that I have genes without an associated transcript? Could I remove them? Does this make any sense or I am completely wrong? What am I doing wrong?

I would use the next command after it (If I am not wrong).

paste -d ‘,’ ensecat.txt ensecag.txt > gene_map.csv

go back to RStudio to generate my tx2gene file, and then tximport.

gene_map <- read_csv ("gene_map.csv",
col_names = c('esentid', 'esengid'))

count_data = tximport (files = sample_files,
type ="salmon",
tx2gene = gene_map,
ignoreTxversion = F)

Thank you so much.

Read more here: Source link