I am working on RNAseq data for which one I have the Phenotype (DPN or PDPN), Gender (female or Male), Age and the batch (1st_round or 2nd_round ):
ID PiNS.ID Phenotype PiNS Gender Age batch PINS_0112 112 PDPN PINS_0112 Female 64 1st_round PINS_0171 171 DPN PINS_0171 Male 74 2nd_round
Basically, I want to get the differential gene expression regarding the Phenotype (DPN or PDPN). However, I did a PCA and I have a strong batch effect.
So, I would like to design my analysis to get the differential gene expression for the Phenotype but also considering the batch effect (by removing it), Gender and the Ages.
So far, I designed my analysis as follow:
dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~ batch + Phenotype) dds$Phenotype <- factor(dds$Phenotype, levels = c("DPN","PDPN")) dds <- DESeq(dds, full=design(dds))#, reduced = ~ batch)
First, I am not sure if it is the correct way to remove/consider the batch effect?
Secondly, I am not sure how to write my design to also consider the Gender and the Age!
Many thanks in advance for your help.