Hello,
Trying to generate a frame with 2 columns: transcript_id and gene_id, in LINUX
(gtf from esembl)
grep -P -o ‘ESNCAGd{11} Equus_caballus.EquCab3.0.104.gtf’ > ensecag.txt
grep -P -o ‘ESNCATd{11} Equus_caballus.EquCab3.0.104.gtf’ > ensecat.txt
wc -l enseca* # To see if both files have the same length
They are not the same length: my geneid is bigger than my gene transcript. Could it be possible that I have genes without an associated transcript? Could I remove them? Does this make any sense or I am completely wrong? What am I doing wrong?
I would use the next command after it (If I am not wrong).
paste -d ‘,’ ensecat.txt ensecag.txt > gene_map.csv
go back to RStudio to generate my tx2gene file, and then tximport.
gene_map <- read_csv ("gene_map.csv",
col_names = c('esentid', 'esengid'))
count_data = tximport (files = sample_files,
type ="salmon",
tx2gene = gene_map,
ignoreTxversion = F)
Thank you so much.
Read more here: Source link