Identifying Private SNPs between multi sample vcf files.

Identifying Private SNPs between multi sample vcf files.


Dear Community,

Hope all is well. I am having difficulty finding the best way to quantify Private SNPs between my multi sample VCF files.
For example, I have 110 samples in my VCF file that I generated via CohortCalling using GATK. I have separated the VCF by samples who are in the same genus.

So I now have 4 VCF files (populations) I would like to compare. I would like to know the total amount of private SNPs compared to each population.

However when I attempt to use command such as BCFTOOLS:

bcftools isec Genus1.vcf.gz Genus2.vcf.gz -p /dir/out

It outputs the correct files but is unable to identify shared or private sites between multisample VCF’s.

When I used vcf-compare:

 vcf-compare -g Genus1.vcf.gz  Genus2.vcf.gz

it is only able to output the total number of SNPs. It cant discern any differences between the multi-sample VCF file.

Note: When I run these commands on VCF that contains only one sample these commands execute perfectly and output appropriate data.

Note: I have indexed my files with TABIX and have zipped them using bgzip.

Can anyone offer any guidance or help as to how to quantify total private snps in a multi-sample VCF file compared to another multisample VCF file?

Thank you for taking the time to read my post and for your help!




bcftools isec


Read more here: Source link