KING struggle: Relatedness

I struggle to infer relationships in a dataset of 20K exomes from tens of kits.

At first I found a well-covered union of regions – check.

Second, I performed everything to merge 20K VCFs into one. Removed indels and multi-allelic variants. Check.

Still, when I run KING with “kinship” option, it finds a lot of relatives. But I need KING with –related option. With IBD2 and IBD1. And here I get 0 first degree and 0 second degree relatives (still some MZ pairs).

Which basically says that I can’t infer IBD-segments and it is (I think) due to QC failed samples.

Is there a procedure for an automated QC here? Or I need to make a PCA and do “remove outliers – build PCA again – remove outliers – iterate until no outliers” procedure? Is there any other reason why KING may behave so nasty with me?

Some data to give an idea (toy dataset of 3K exomes):

King with –related:

Source        MZ      PO      FS      2nd     3rd     OTHER
  ===========================================================
  Pedigree      0       0       0       0       0       5512860
  Inference     30      0       0       3       8       5512819

King with –kinship:

  Source        MZ      PO      FS      2nd     3rd     OTHER
  ===========================================================
  Pedigree      0       0       0       0       0       5512860
  Inference     30      58      463     19      895     5511395

I was able to perform relatedness inference with 10K dataset (subset of this one) 1 year ago. I have no idea what is different now (except now no one filtered QC failed samples) – I simply execute the same makefile.

Source link