I am still wrestling with limma, trying to apply it to my metabolomics data.
The main problem is the fact that, with my data, limma produces produces false positives, i.e. significant p-values when they clearly should not be significant.
With my new data set I have the same problem again. I suspect that I failed to stabilize the variance enough, which leads to false positives.
I prepared my data the following way:
- Normalized raw data
- Removed metabolites where Missingness > 10% across test samples
- Started log-transformation: log2( x + 0.5e-4 )
- Blinded the experiment, because it is unpublished data.
For example for compound 8
I get a limma p-value of 0.003, which is clearly wrong.
A Welch’s test results in p = 0.4692.
I calculated the ordinary limma p-value and got 0.0035, which makes me think that I do not meet the requirements for linear regression.
Is my suspicion, correct or did I go wrong somewhere else?
I would be very thankful, if somebody could point me into the right direction.
I used the following code:
# Load limma library(limma) # Load data m <- as.matrix(read.csv(url('https://pastebin.com/raw/gppyEXyV'), row.names=1)) design <- read.csv(url('https://pastebin.com/raw/PwkP7u81'), row.names=1) cont_m <- read.csv(url('https://pastebin.com/raw/6p5UNRK3'), row.names=1) # Limma fit <- lmFit(m, design) fit <- contrasts.fit(fit, cont_m) fit <- eBayes(fit, trend=TRUE) # Check residuals plotSA(fit) # Data of compound in question m['compound_8',c(10,11,12,45,46,47)] dotchart(m['compound_8',c(10,11,12,45,46,47)]) # Limma result topTable(fit, n=Inf) %>% tibble::rownames_to_column("unique_name") %>% filter(unique_name == 'compound_8') # Welch's test t.test(m['compound_8',c(45,46,47)],m['compound_8',c(10,11,12)]) # Ordinary p-value from limma tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma p.value.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual, lower.tail=FALSE) p.value.ord['compound_8',]
Thank you very much for your time!
Read more here: Source link