I am working on a project trying to do RNAseq on highly degraded formalin-fixed brain tissues. We extracted RNA using Qiagen FFPE RNeasy kit and the sequencing company suggested to do rRNA depletion method to prepare libraries. We sequenced with PE150 with 30-50M uniquely mapped reads out of 120M total reads. The total mapping rate is 30-50% and uniquely mapping rate is 25-45%. Most of the reads were mapped to intergenic region (~80%) while exonic read is only below 2% and intronic is 20%. I was wondering if anyone knows if it is common to see this proportion of intergenic reads in formalin-fixed tissue? Is it still possible to do any differential gene expression with data of this quality?
Regarding ways to improve data quality, I read that from Schuierer et al, 2017, that they could get 90% exonic reads with Illumina RNA Access kit to maximize proportion of exons to be sequenced but this is mostly used for human clinical samples (we have non-clinical animal tissues). Would it be worthy of trying? Thanks a lot in advance!
Thing 1) check for ‘contamination’ with something like fastq-screen, for example, mycoplasma might result in poor quality data, or perhaps you have a lot of adapter reads
Thing 2) What is your reference genome? Are you sure whatever annotation file you’re using is ok? Do other ‘housekeeping’ genes look ok?
Thing 3) How are you mapping reads? You might want to consider mapping a few different ways, for example to a genome vs transcriptome vs using STAR with varying parameters for how splice junctions are mapped. Additionally, if you haven’t visualized the data in an IGV-like program it is worthwhile to ‘see’ how exactly reads align (and btw, this will also visualize splice junctions!).
Thing 4) make sure the rRNA depletion worked – what was the % mapped to rRNA?
Thing 5) actually this is kinda the most obvious thing – you have ‘contamination’ from DNA!!! Are you sure that in your RNAseq protocol you removed all the DNA?!