Sorry for all my questions lately, but as a novice which has to figure out how to analyse QuantSeq data, this forum has been a great and indispensible help for me.
I’m doing a human transcriptomics analysis where we have QuantSeq data for 600 human patients with a certain condition which is quite similar in a substantial amount of aspects but different in others. We have 300 patients in one group and 300 patients in the other and I already followed the edgeR manual doing TMM-normalization among other things. But now I noticed that there are also ERCC/SIRV Spike Inns in the dataset!
I did some literature research and found that normalization with Spike-Ins is a possibility, but in some cases show mixed performance and is not always as accurate. Furthermore, TMM seems to be the preferable way IF the assumptions are fulfilled (which are DE and non-DE genes behave the same and there is symmetric expression). But my problem is now, how can I be sure if those assumptions are fulfilled in my experiment? As said the two conditions of interest are comparable in certain aspects (lead to the same clinical syndrome), but have different etiologies. I tend to favour TMM in my experiment as I do expect some genes to be up/down regulated in both conditions. In other words I do not expect that samples from one condition to be totally different from the other.
Does anyone have some advice on this matter? Could I just continue as I did now (with the extra step to exclude the Spike-Ins) or are Spike-Ins greatly adviced?
TMM is very robust in most situations. You need quite some global changes to break it. This is actually what the plotMD can be used for that you asked about before. Just run the DE analysis and make such a plot based on the topTags output, which visualizes the two groups in terms of the fold change to average expression ratio. You will easily see whether the assumptions hold (most likely they do). The MD (aka MA) plot will have the bulk of genes centered along y = 0 and then have the typical arrowhead-like shape. If the bulk is centered somewhat at y=0 you’re fine. There is most likely no need for any spike controls. You can even do it manually by plotting logCPM on x-axis and logFC on y-axis. Just follow the edgeR vignette, e.g. page 57/58.