Pca From Vcf Files

Pca From Vcf Files

3

Can anyone recommend a good software for doing Principal Component Analysis from data in VCF file format, or the most straightforward format to convert the VCF into for doing PCA. I hear that Plink is quite suitable for this. I also have some experience using eigenstrat for SNP data but have no experience using eigenstrat with whole genome VCF encoded data. Any tips or experience appreciated.

Many thanks,

Rubal


pca


vcf


genome

• 26k views

You can use VCFtools to make a PED and MAP file from VCF. This is PLINK format. Many PCA programs take PLINK input or offer conversion scripts.

I ended up using SNPRelate. After some silly errors here is how I got it to work:

setwd("/xxx/pca")
library("SNPRelate")
vcf.fn<-"~/xxx/tmp.vcf"
snpgdsVCF2GDS(vcf.fn, "ccm.gds",  method="biallelic.only")
genofile <- openfn.gds("ccm.gds")
ccm_pca<-snpgdsPCA(genofile)
plot(ccm_pca$eigenvect[,1],ccm_pca$eigenvect[,2] ,col=as.numeric(substr(ccm_pca$sample, 1,3) == 'CCM')+3, pch=2)

SNPRelate is an R package that is able to read from VCF files directly and perform PCA and IBD/IBS. According to the documentation, it runs 10-45x faster than EIGENSTRAT (v3.0) and PLINK (v1.07) respectively.

Update (Oct 2014): The package seems to be moved to GitHub (link)

6.8 years ago by


sa9

&utrif;

840


Login
before adding your answer.

Traffic: 2194 users visited in the last hour

Source link