Large DE LogFC range
I’m working with DESeq2 to make a DE analysis between samples in two different conditions. During the analysis, I identified a batch effect due to the sequencing time modelled as a covariate in the design formula.
From the differential expression (Wald test), I was able to retrieve a good number of significant genes (~100) but the LogFC range looks not reliable going from -30 to +30.
- What could be the cause of these extreme large values and how can I solve the problem? I tried to use lfcShrink() to re-estimate the logFC but I’m not sure that is sufficient to achieve reliable results.
- My second question is about the design of the model. Is it reasonable to add covariates to the model also if they don’t show a strong effect on the data (looking PCA or clustering)?
Follow the code used for the analysis.
Thanks for your help!
dds <- DESeqDataSetFromMatrix(count, coldata, design = ~ Group + Condition)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds <- DESeq(dds)
res <- results(dds, contrast=c("Condition","dis","hea"))
res <- res[which(res$padj <0.05),]
• 68 views
Agree with ATpoint, look at the genes with large LFC (from lfcShrink) using plotCounts
. Usually large MLE LFC are from all 0’s in one group.
“Is it reasonable to add covariates to the model also if they don’t show a strong effect on the data”
This is up to the analyst. You can add them to be careful but it does come at a loss of degrees of freedom. So if they truly don’t have any effect on any genes, it’s best to leave out unnecessary covariates.
Traffic: 391 users visited in the last hour
Read more here: Source link