Limma false positives

Dear Community,

I am still wrestling with limma, trying to apply it to my metabolomics data.

The main problem is the fact that, with my data, limma produces produces false positives, i.e. significant p-values when they clearly should not be significant.

I already had a very similar problem, which could be solved by using a started log-transformation to stabilize the variance.

With my new data set I have the same problem again. I suspect that I failed to stabilize the variance enough, which leads to false positives.

plotSA

I prepared my data the following way:

  • Normalized raw data
  • Removed metabolites where Missingness > 10% across test samples
  • Started log-transformation: log2( x + 0.5e-4 )
  • Blinded the experiment, because it is unpublished data.

For example for compound 8
enter image description here
I get a limma p-value of 0.003, which is clearly wrong.
A Welch’s test results in p = 0.4692.
I calculated the ordinary limma p-value and got 0.0035, which makes me think that I do not meet the requirements for linear regression.

Is my suspicion, correct or did I go wrong somewhere else?
I would be very thankful, if somebody could point me into the right direction.

I used the following code:

# Load limma
library(limma)

# Load data
m <- as.matrix(read.csv(url('https://pastebin.com/raw/gppyEXyV'), row.names=1))
design <- read.csv(url('https://pastebin.com/raw/PwkP7u81'), row.names=1)
cont_m <- read.csv(url('https://pastebin.com/raw/6p5UNRK3'), row.names=1)

# Limma
fit <- lmFit(m, design)
fit <- contrasts.fit(fit, cont_m)
fit <- eBayes(fit, trend=TRUE)

# Check residuals
plotSA(fit)

# Data of compound in question
m['compound_8',c(10,11,12,45,46,47)]
dotchart(m['compound_8',c(10,11,12,45,46,47)])

# Limma result
topTable(fit, n=Inf) %>%
    tibble::rownames_to_column("unique_name") %>%
    filter(unique_name == 'compound_8')

# Welch's test
t.test(m['compound_8',c(45,46,47)],m['compound_8',c(10,11,12)])

# Ordinary p-value from limma
tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma
p.value.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual, lower.tail=FALSE)
p.value.ord['compound_8',]

Thank you very much for your time!

Read more here: Source link