Can I remove the control in differential expression analysis?

Hi there,

Essentially, my experimental design is control vs treatment. Cells were sorted based on fluorescence, so there are 4 different “colors” of treated cells, i.e. red, green, green+red, and blue+green+red. I am interested in how the colors differ from one another. And, I have duplicates for all colors and the control.

> colData(dds_no)
  DataFrame with 10 rows and 3 columns
  sample             color          sizeFactor
  <factor>  <factor>            <numeric>
 CTRL_1     control     0.730176128359336
 CTRL_2     control     1.12370593310441
  GFP_1     green       1.62229333835717
  GFP_3     green       0.733077520973604
 GFPRFP_1   greenred    1.24575808750345
 GFPRFP_2   greenred    1.27612350159403
 RFP_2          red         1.57975191196927
 RFP_3          red         0.518878115991023
 TRIPLE_1       rgb         0.833793868046399
 TRIPLE_3       rgb         1.01334467700869

I am wondering if it is appropriate to remove my control from downstream differential expression analysis when I want to only look at variation between treated cells? When I include the control, the PCA shows that the control samples are well separated from all treated cell types.

What I want (and have done) is removed control from the raw counts, built the deseq2 object with ~color as my design, produced results with LRT instead of Wald, and selected significant DEGs by the adj. pvalue. Then from this, I created a heatmap with my samples (still without control) to show the variation they have with certain significant genes and gene sets.

dds_no = DESeqDataSetFromMatrix(countData=countData_no,
                         colData=colData_no,
                         design=~color)
dds_LRT = DESeq(dds_no,test="LRT", reduced=~1)
res_LRT <- results(dds_LRT)
rld_LRT<- rlogTransformation(dds_LRT)
pathsLRT<-assay(rld_LRT)
df_pathLRT <- cbind(rownames(res_LRT), data.frame(res_LRT, row.names=NULL))
topTableLRT <- as.data.frame(df_pathLRT)
sigGeneListLRT <- subset(topTableLRT,  padj<=0.05)[,1]
topMatrixLRT <- pathsLRT[which(rownames(pathsLRT) %in% sigGeneListLRT),]
topMatrixPATHSLRT <- gsva(data.matrix(topMatrixLRT),
                   stress_Cao_etal_2017,
                   method="gsva",
                   min.sz=1,
                   max.sz=Inf,
                   kcdf="Gaussian",
                   mx.diff=TRUE,
                   verbose=TRUE)
heat_pathsLRT <- t(scale(t(topMatrixPATHSLRT)))
pheatmap(heat_pathsLRT, annotation_col=dfdds, fontsize_row = 8, cluster_rows = FALSE, cluster_cols =
 TRUE, main="Stress Paths", annotation_legend=FALSE)`

Please let me know if this approach is acceptable or if it truly requires I include the control.

Thanks,
Morgan

Read more here: Source link