Negative values after batch correction using removeBatchEffect from Limma

I am trying to correct my RNA seq data for 3 categorical variables as well as preserve the biological information of the dataset. In order to do that, I have used the removeBatchEffect function from limma.

I used a log2(TPM counts + 1) matrix as my input but… as you can see in the following screenshot, the corrected counts that I get have negative values. Since it doesn’t make sense to have genes with a negative expression, is there any way to solve this? I don’t know if the output is log2(tpm) and I just have to do +1 or unlog the corrected data… In any case, I cannot remove the genes with negative values, I would like to keep all of them if it is possible.

corrected_cts

Does anybody know how to solve this?

Thanks very much in advance

Code:

set.seed(11344)

covariates_df <- data.frame(
  Sample = paste0("A", seq(1,960)),
  Age = sample(seq(18, 70), size = 960, replace = TRUE),
  Group = sample(seq(1, 3), size = 960, replace = TRUE),
  Sex = sample(seq(1, 2), size = 960, replace = TRUE),
  WBC = sample(seq(0.5, 35, by=0.3), size = 960, replace = TRUE),
  Place = sample(seq(1, 2), size = 960, replace = TRUE),
  Tubes = sample(seq(1, 2), size = 960, replace = TRUE),
  LibPrep = sample(seq(1, 16), size = 960, replace = TRUE)
)
rownames(covariates_df) <- covariates_df$Sample

categorical_variables <- c("Sample", "Group", "Sex", "Place", "Tubes", "LibPrep")
covariates_df[,categorical_variables] <- lapply(covariates_df[,categorical_variables],as.factor)

>str(covariates_df)

'data.frame':   960 obs. of  8 variables:
 $ Sample : Factor w/ 960 levels "A1","A10","A100",..: 1 112 223 334 445 556 667 778 889 2 ...
 $ Age    : int  40 66 37 40 25 42 46 51 45 61 ...
 $ Group  : Factor w/ 3 levels "1","2","3": 2 2 2 3 2 2 3 3 3 3 ...
 $ Sex    : Factor w/ 2 levels "1","2": 1 2 2 1 1 2 1 1 2 1 ...
 $ WBC    : num  12.8 29.9 25.7 0.8 30.8 5.9 30.5 19.1 2.9 12.2 ...
 $ Place  : Factor w/ 2 levels "1","2": 1 1 2 1 2 1 2 2 1 2 ...
 $ Tubes  : Factor w/ 2 levels "1","2": 1 1 2 1 2 2 2 2 1 2 ...
 $ LibPrep: Factor w/ 16 levels "1","2","3","4",..: 14 1 13 4 10 5 1 8 7 5 ...


dim(log2TPM_1_counts_mat)
[1] 60000   960

> str(log2TPM_1_counts_mat)
 num [1:60000, 1:960] 0.51 0 2.76 2.64 2.26 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:60000] "ENSG00000000003" "ENSG00000000005" "ENSG00000000419" "ENSG00000000457" ...
  ..$ : chr [1:960] "A1" "A2" "A3" "A4" ...

### I wanted to upload the counts data but it is too huge and I didn't manage. I tried to create a small counts data, but I wasn't getting any negative values as I get in reality in order to show you my problem.

Design of the model:

design <- model.matrix(~Age + Group + Sex + WBC + Place + Tubes + LibPrep, data=covariates_df)
bio.design <- design[,1:6]
batch.design <- design[,-(1:6)]

Adjustment with removeBatchEffect:

corrected_cts <- limma::removeBatchEffect(log2TPM_1_counts_mat,design=bio.design,covariates=batch.design)

Read more here: Source link