SNP Pruning Through PCA (Edit: Feature Selection Through PCA)
I have roughly 1 million SNPs from 700 individuals and I wanted to prune the SNPs down, potentially through PLINK’s –pca command. However, I’m a little perplexed with how the eignvals/vectors I receive from the –pca command are to be used in order to prune my SNPs. Or am I completely misunderstanding? Could anyone clarify?
Below is a sample of the vectors:
Edit: I want to leave the original post up but to further clarify. From my ML experience, PCAs can perform feature selection and I wish to do the same with the SNPs (apologies if ‘pruning’ means something different in bioinformatics.)
Below is a sample of my variant weights:
In Python, the PCA does the feature selection automatically once you’ve fitted/transformed the data. So is there a way of performing feature selection on the SNPs? Like looking at the variant’s first 3 weights and only take SNPs that have a minimum weight of ‘X’?
• 209 views
Read more here: Source link