I’m trying to remove batch effect from my data using the sva
package. The process described here is like this:
library(bladderbatch)
data(bladderdata)
pheno = pData(bladderEset)
edata = exprs(bladderEset)
# The null model contains only the adjustment variables. Since we are not adjusting
# for any other variables in this analysis, only an intercept is included in
# the model.
mod0 = model.matrix(~1,data=pheno)
# The full model includes both the adjustment variables
# and the variable of interest
mod <- model.matrix(~1+ cancer, data=pheno)
# Identify the number of latent factors that need to be estimated.
n.sv = num.sv(edata,mod,method="leek")
# estimate the surrogate variables
svobj = sva(edata,mod,mod0,n.sv=n.sv)
# include the surrogate variables in both the null and full models
modSv = cbind(mod,svobj$sv)
mod0Sv = cbind(mod0,svobj$sv)
# Adjusting for surrogate variables using the limma package
fit = lmFit(edata,modSv)
Now I want to have a one vs all comparison. So, according to this post by Michael Love my model should be like this:
limma_mod <- model.matrix(~0 + cancer, data=pheno)
I’m confused whether I should use model.matrix(~1+ cancer, data=pheno)
as my full model and model.matrix(~1, data=pheno)
as the null model, then, for limma
, merge model.matrix(~0+ cancer, data=pheno)
with the surrogate variables, or use model.matrix(~0+ cancer, data=pheno)
as my full model and model.matrix(~1, data=pheno)
as the null model?
Note: using mod0 <- model.matrix(~0, data=pheno)
and mod <- model.matrix(~0+ cancer, data=pheno)
results in this error:
Error in solve.default(t(mod0) %*% mod0) : 'a' is 0-diml
Read more here: Source link