Hi all,I met a very strange error when reading and doing RMA of the raw cel files.
When i use the following codes to do the background correction and normalization of GSE18997 (platform GPL570), I found the order of some probe IDs of the final results seems to be different from the series Matrix that the author have uploaded in GEO, BUT the order of the value of the final results remains the same as the series Matrix! That means in the series matrix that the author uploaded, the expression of probe A (for example) is 2,3,4,…., but when we used the raw cel files to do recorrection and renormalization, the expression of probe B becomes 2,3,4,…. We are very confused.
GSE18997 raw cel files are downloaded, and we used the following codes to read it and do rma correction:
library(oligo)
setwd(“C:UsersDesktopfiles”)
data.dir<-“C:UsersDesktopfiles”
celfiles<-list.files(data.dir,”.CEL$”)
data.raw<-read.celfiles(filenames=file.path(data.dir,celfiles))
normalData=rma(data.raw)
write.table(file=”matrixfromcel.xls”,exprs(normalData),sep=”t”,quote=FALSE)
And finally we get the probe matrix from cel, and we also downloaded the series matrix that the author uploaded in order to make a comparison.
But if you check the row No.9939 of the two matrix, you will find that the probe ID 177_at is at row No.9939 of our matrix from cel,but in the author’s series matrix , it is at row No.9940 (after removing the first 77 rows which is not the real series matrix). AND probe ID 1773_at is row No.9940 of our matrix, but is at row No.9939 of the author’s series matrix. Although the order of IDs are changed, the value of the row did not changed. Both rows No.9939 in the two matrix is around 4.5-5.5 , and both rows No.9940 is around 7-8.5, which means in the two matrix, rows No.9939/ No.9940 is the same probe but was attached with different probe ID. It is so strange.
The same phenomenon also occurs in the row No.12250 to row No.12350 of the two matrix.
In order to check which order is correct, I first check the GPL file of platform GPL570 , and found out that in the gpl file, the order of the probe ID is the same as the author’s series matrix. And we then check another two GEO datasets (GSE3998 and GSE11882) which are also on plantform GPL570, the order of probe ID of these two series matrix is also the same as the author‘s series matrix of GSE18997. So I assume that I make the mistake. I guess this is because the special probe ID name of GPL570。Because in our matrix,the short name probe (177_at) ranks in front of the long name probe(1773_at),but in the author‘s matrix,the short name probe ranks behind the long one.
I have no idea why this phenomenon could happen, and so far we can’t even find out which matrix is correct. Since GPL570 is a soooo common platform,Would anyone help to find out how this error happens? and how to avoid it?
Thank you