Identifying Private SNPs between multi sample vcf files.
Hope all is well. I am having difficulty finding the best way to quantify Private SNPs between my multi sample VCF files.
For example, I have 110 samples in my VCF file that I generated via CohortCalling using GATK. I have separated the VCF by samples who are in the same genus.
So I now have 4 VCF files (populations) I would like to compare. I would like to know the total amount of private SNPs compared to each population.
However when I attempt to use command such as BCFTOOLS:
bcftools isec Genus1.vcf.gz Genus2.vcf.gz -p /dir/out
It outputs the correct files but is unable to identify shared or private sites between multisample VCF’s.
When I used vcf-compare:
vcf-compare -g Genus1.vcf.gz Genus2.vcf.gz
it is only able to output the total number of SNPs. It cant discern any differences between the multi-sample VCF file.
Note: When I run these commands on VCF that contains only one sample these commands execute perfectly and output appropriate data.
Note: I have indexed my files with TABIX and have zipped them using bgzip.
Can anyone offer any guidance or help as to how to quantify total private snps in a multi-sample VCF file compared to another multisample VCF file?
Thank you for taking the time to read my post and for your help!
• 657 views
Read more here: Source link