how to get ride of duplicated genes when we also have duplicated Ensemble ID in the expression profile?
I have a mouse expression profile that is annotated with gene symbols and many of them are duplicated. I usually use collapseRows function with maxMean method from WGCNA package to get ride of duplicated genes. However, this time I realized that there are also some duplication in ENSEMBLE IDs. Can any help me how should I deal with this situation?
Should I just simply remove duplicated ENSEMBLE ID and then use collapseRows function for duplicated genes?
This is part of my data:
ENSMUSG00000019864 Rtn4ip1 3.33471 2.18619 3.52304 4.13997 2.91682 3.17805 ENSMUSG00000019864 Rtn4ip1 0.141481 0 0.126809 0.140919 0 0.159667 ENSMUSG00000019865 Nmbr 0.0325972 0 0.056908 0.0324288 0.305734 0 ENSMUSG00000019866 Crybg1 8.79001 6.82754 13.9235 15.1803 9.54965 11.3725 ENSMUSG00000019867 Gje1 0 0 0 0 0 0
as you can see for example ENSMUSG00000019864 id is duplicated with different expression value?
I really appreciate any help or suggestion!
• 504 views
Read more here: Source link