DESeq2 aggregated single cell data

Hi,

Im aiming to use aggregated single cell data to perform a pseudobulk analysis to assess differential expression between those with sarcopenia and those without, termed “status_binary” with the levels “yes” and “no”.


# DESeq2 ------------------------------------------------------------------

dds <- DESeqDataSetFromMatrix(y$counts, 
                              colData = y$samples, 
                              design = ~Sex+age_scaled+status_binary)

# Transform counts for data visualization
rld <- vst(dds, blind=TRUE)

# Plot PCA
DESeq2::plotPCA(rld, intgroup = "status_binary")

# Extract the rlog matrix from the object and compute pairwise correlation values
rld_mat <- assay(rld)
rld_cor <- cor(rld_mat)

# Plot heatmap
pheatmap::pheatmap(rld_cor, annotation = y$samples[, c("status_binary"), drop=F])

# Run DESeq2 differential expression analysis
dds <- DESeq(dds)

# Plot dispersion estimates
plotDispEsts(dds)

# Check the coefficients for the comparison
resultsNames(dds)

[1] "Intercept"               "Sex_Male_vs_Female"      "age_scaled"              "status_binary_Yes_vs_No"

res <- results(dds,
               contrast = c("status_binary", "Yes", "No"),   # For catagorical variables
               alpha = 0.05)

# Shrink the log2 fold changes to be more appropriate using the apeglm method - should cite [paper]() when using this method
res <- lfcShrink(dds, 
                 coef = "status_binary_Yes_vs_No",
                 res=res,
                 type = "apeglm")

hist(res$pvalue, xlab="p value", main="Histogram of nominal p values")

# Turn the results object into a tibble for use with tidyverse functions
res_tbl <- 
  res %>%
  data.frame() %>%
  rownames_to_column(var="gene") %>%
  as_tibble() %>%
  arrange(padj, pvalue)

The above code works great, however when following the advice from the most recent vignette 1 with respect to handling single-cell data with DESeq2, I get the below message when trying to run lfcShrink:

dds <- DESeqDataSetFromMatrix(y$counts, 
                              colData = y$samples, 
                              design = ~Sex+age_scaled+status_binary)

# Transform counts for data visualization
rld <- vst(dds, blind=TRUE)

# Plot PCA
DESeq2::plotPCA(rld, intgroup = "status_binary")

# Extract the rlog matrix from the object and compute pairwise correlation values
rld_mat <- assay(rld)
rld_cor <- cor(rld_mat)

# Plot heatmap
pheatmap::pheatmap(rld_cor, annotation = y$samples[, c("status_binary"), drop=F])

design <- model.matrix(~Sex+age_scaled+status_binary, y$samples)
reduced <- model.matrix(~Sex+age_scaled, y$samples)
design
reduced

dds <- scran::computeSumFactors(dds)

# Run DESeq2 differential expression analysis
dds <- DESeq(dds, 
             test="LRT", 
             fitType = "glmGamPoi",
             useT=TRUE, 
             full = design,
             reduced = reduced,
             minmu=1e-6, 
             minReplicatesForReplace=Inf)

# Plot dispersion estimates
plotDispEsts(dds)

# Check the coefficients for the comparison
resultsNames(dds)

[1] "X.Intercept."     "SexMale"          "age_scaled"       "status_binaryYes"

res <- results(dds,
               contrast = list("status_binaryYes"),   # For catagorical variables
               alpha = 0.05)

res_img <- lfcShrink(dds, 
                     coef = "status_binaryYes",
                     res=res,
                     type="apeglm"
)

Error in lfcShrink(dds, coef = "status_binaryYes", res = res, type = "apeglm") : 
'coef' should specify same coefficient as in results 'res'

Is there a reason the name of the coeffficient is no longer working? Also im unclear on the role of lfcShrink – should it be used as our primary results when assesing logFC difference in expression or just to aid visulaisation?

Thanks,
Matt

Read more here: Source link