I have recently performed RNA-Seq on the total RNA of a mosquito tissue, where I have three biological replicates of the tissue at three different time points. The pipeline I used was HISAT2 –> featureCounts –> DESeq2.
Looking at the normalized counts (output of DESeq2), the counts of some rRNA genes are extremely high in the third replicate compared to first and second. All the samples underwent polyA selection. For example, the counts in replicates 1 & 2 are 122 and 303 respectively, but 200,000 in the third replicate. This is true for almost all of the rRNA genes in the genome.
Replicates 1 & 2 were sequenced together, and then replicate 3 was sequenced a few months later. All were sequenced by an external company. My initial thoughts were that the polyA selection failed in the replicate 3 samples, but the external company disagrees – the % rRNA in the samples ranged from 2-13%.
I’m not sure why this is the case but in terms of my options, what do I have? Specifically:
1) Should I remove the rRNA genes from the annotation file so that featureCounts won’t assign features to those transcripts?
2) Should I just ignore them? I’m not interested in analyzing rRNA genes, but my only concern in doing this is that the PCA plot looks bad, replicates 1 & 2 cluster nicely but samples of replicate 3 cluster together more so than to the previous replicates. I think the huge variation in rRNA counts is the cause of this.
3) Should I ignore the third replicate altogether? I’m not sure if having two replicates is common or accepted in RNA-Seq experiments and the literature.
Thanks in advance, and I hope I have included enough information.
Read more here: Source link