Different relatedness estimates by PLINK and VCFTOOLS despite same method

According to the vcftools manual, specifying the “–relatedness2” flag allows calculating relatedness statistics using the method by Manichaikul et al., BIOINFORMATICS 2010 (doi:10.1093/bioinformatics/btq559). That is, based on KING. According to the PLINK manual, PLINK uses the same method to calculate relatedness when specifying the flag “–make-king-table”. So, although both PLINK and vcftools seem to be using the same algorithm, the name of the outputted relatedness statistics differs. In the vcftools output, it’s called “RELATEDNESS_PHI”, whereas in PLINK it’s called “KINSHIP”.

More worrying to me is, however, that the relatedness estimates markedly differ between PLINK and vcftools. How is this possible given that they apparently rely on the same method? Below a graph that shows relatedness estimates calculated with both programs based on the same vcf input file.

Comparison of PLINK vs vcftools

I also checked whether results change when I run relatedness analyses on a raw vcf file outputted GATK’s GenotypeGVCFs which I just mildly filtered with vcftools. But estimates still differ between PLINK and vcftools.

Notably, vcftools throws me some warnings when running the relatedness analysis (see below). Yet, the actual relatedness estimates I get from vcftools make biologically more sense than the estimates I get from PLINK. The ones from PLINK are markedly too high.

*## Warning for vcftools:
Parameters as interpreted:
vcftools
--vcf INPUT.vcf
--out OUTPUT
--relatedness2
Warning: Expected at least 2 parts in FORMAT entry: ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
Warning: Expected at least 2 parts in FORMAT entry: ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
Warning: Expected at least 2 parts in FORMAT entry: ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
After filtering, kept 418 out of 418 Individuals
Outputting Individual Relatedness
After filtering, kept 13357 out of a possible 13357 Sites*

Read more here: Source link