Differential gene expression analysis on microarray dataset with files from two chips

Hello all,

I’m used to analyzing mostly RNAseq data, microarray is a bit new to me. I’m trying to analyze data from www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47183 in R. When I try to read in the .CEL files using read.celfiles() from the oligo package, it informs me that the CEL files are not of the same type, and indeed the GEO page mentions 2 platforms (Affymetrix Human Genome U133, and Affymetrix GeneChip Human Genome HG-U133A Custom CDF).

So I’m analyzing data from the two platforms separately.

1) Is this the correct way to proceed, or am I missing anything? Is there a way to read in all the data together and generate a single expression matrix for differential gene expression analysis?

2) If I generate two separate expression matrices, would I need to perform differential expression analysis on the different platforms separately, or can I combine the datasets (e.g. using Limma or ComBat) and THEN do the analysis? I realize this is a question that pops up in various forms on multiple platforms, but since philosophies of analysis seem to evolve over time, I was wondering what the latest consensus was. It seems that batch correcting expression matrices in general is suboptimal, and it is preferable to instead use “batch” as a covariate in the analysis. The problem is – the platforms are different, and additionally, there are two potential control groups, and the control group I prefer (healthy donor kidneys) is only found in one of the datasets.

I may be missing something fundamental here, since I am new to MA analysis. Usually, I read in all the files – this is the first time I’ve seen two platforms in a “single” dataset, so any advice on how to proceed would be greatly appreciated!!

Cheers

Read more here: Source link