How to use spike-in information (sequences from another species) with DESeq2::DESeq()


I am working with colleagues to perform differential expression analyses using data that have been spiked with RNA from another species, the purpose of which is to get a sense of the absolute numbers of transcripts that are up and down between conditions. We add the spike-in at the cell stage, prior to the steps of library generation, so the working assumption is that any features or quirks of library generation are equally reflected in the sample and spike-in information.

My question is, “What is the appropriate way to use this spike-in information with DESeq2?” Is it appropriate to use the spike-in information with the DESeq2::estimateSizeFactors() controlGenes argument (e.g., including the thousands of genes for this other species in the counts matrix that is supplied to DESeq2, then supplying the other-species gene names as a vector to the controlGenes)? When following this approach and subsequently running DESeq2::DESeq(), I believe the genes for the other species remain in the counts matrix and are included in the tests for significance, multiple hypothesis testing, etc. So, we should somehow exclude these genes from the counts matrix in the DESeqDataSet object prior to running DESeq2::DESeq()—is that correct?

Another thing we’ve considered is calculating and applying a simple sample-wise scaling factor prior to running DESeq2 as follows:

  1. We calculate a scaling factor for the experiment (EXP) condition: x = (no. EXP sample reads)/(no. EXP spike-in reads)
  2. We calculate a scaling factor for the control (CTRL) condition: y = (no. CTRL sample reads)/(no. CTRL spike-in reads)
  3. We take a counts matrix such and we divide the sample counts by the corresponding scaling factor, e.g.,
gene    EXP CTRL
A   5   10
B   15  100
C   0   2

#  The above becomes...
gene    EXP CTRL
A   5/x     10/y
B   15/x    100/y
C   0/x     2/y
  1. We then use the adjusted counts matrix as input for DESeq2.

Does this seem appropriate to you? (However, I don’t think DESeq2 accepts non-integer counts in the counts matrix.)

We’ll greatly appreciate any input or advice you can give us. Thank you!


Read more here: Source link