How to consider batch effect and multiple variable to identify differential gene expressions for a given Phenotype in DESeq2

Hello All,

I am working on RNAseq data for which one I have the Phenotype (DPN or PDPN), Gender (female or Male), Age and the batch (1st_round or 2nd_round ):

ID  PiNS.ID Phenotype   PiNS    Gender  Age batch
PINS_0112   112 PDPN    PINS_0112   Female  64  1st_round
PINS_0171   171 DPN PINS_0171   Male    74  2nd_round

Basically, I want to get the differential gene expression regarding the Phenotype (DPN or PDPN). However, I did a PCA and I have a strong batch effect.
So, I would like to design my analysis to get the differential gene expression for the Phenotype but also considering the batch effect (by removing it), Gender and the Ages.

So far, I designed my analysis as follow:

dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ batch + Phenotype)
dds$Phenotype <- factor(dds$Phenotype, levels = c("DPN","PDPN"))
dds <- DESeq(dds, full=design(dds))#, reduced = ~ batch)

First, I am not sure if it is the correct way to remove/consider the batch effect?
Secondly, I am not sure how to write my design to also consider the Gender and the Age!

Many thanks in advance for your help.

Kind regards,
Pierre

Source link