Data Imputation for performing UMAP
Hi guys!
Currently I am working on a dataset with gene ID, it’s expression values and patient IDs. I want to use the UMAP method to process the data and compare results with a previous study. That study used a K-means clustering method.
At the moment my data frame have NA and UMAP cannot process that, it expects all as numeric. I did think of replacing it with zero, however a NA is not zero. Logically NA is NA, it’s not detected for some reason but it doesn’t mean it didn’t have any expression. Yet I cannot remove that gene ID, as it may have expressions in some patients, while some don’t (colnames = Patient ID ; rownames = Gene ID).
Information on Google is very limited, however I have stumble across a relatively new imputation method called ALRA (www.nature.com/articles/s41467-021-27729-z), but I’m still reading about it and I am not sure if it is appropriate for my type data.
Do you guys have any suggestions?
• 24 views
Read more here: Source link