Need help understanding reference transcriptome and where to download


Apologies for a pretty elementary question. I tried my best to answer it using resources online but I find many tutorials/explanations out there difficult to understand.

I am trying to quantify human rnaseq data using salmon. The reason I am using salmon is because I would like to perform RNA isoform quantification. I am using salmon’s mapping based mode (building an index and then quantifying).

I have already built an index and quantified samples, only to find that my quant.sf files are 639 rows long. I checked everything and noticed that the log file said:

[2023-01-23 11:44:52.070] [jointLog] [info] Index contained 639 targets.

Q1: Does this mean that my index only showed 639 total transcripts?
Q2: Did this occur because I accidentally used GRCh38.p13.genome.fa (from gencode) instead of gencode.v42.transcripts.fa, or would this not be the reason?

I am currently re-running it with the latter file, but am not confident it will work. I am not sure if what I found is correctly “the reference transcriptome”.

Q3: If not, could anyone advice where to download this file, and what the file exactly contains?

Q4:Would this file (or one suggested) be able to be used for finding transcripts from non-protein coding genes?

Thank you so much, and I’m sorry for all the questions. I have two more small questions, but please feel free ignoring these if you are busy:

My log file for index building (when using the former .genome.fa file had many of the following statements:

[2023-01-20 16:32:24.228] [puff::index::jointLog] [warning] Entry with header [GL000256.2] was longer than 400000 nucleotides. This is probably a chromosome instead of a transcript.

The new output for the file currently has many lines that say:
[2023-01-24 14:21:02.920] [puff::index::jointLog] [warning] Entry with header [ENST00000604102.1|ENSG00000282268.1|OTTHUMG00000184595.3|OTTHUMT00000468925.3|IGHD2OR15-2B-201|IGHD2OR15-2B|31|IG_D_gene|], had length less than equal to the k-mer length of 31 (perhaps after poly-A clipping)

Could anyone advice on what they mean? I couldn’t find information online or in salmon’s handbook.

Thanks so so much.

Read more here: Source link