I struggle to infer relationships in a dataset of 20K exomes from tens of kits.
At first I found a well-covered union of regions – check.
Second, I performed everything to merge 20K VCFs into one. Removed indels and multi-allelic variants. Check.
Still, when I run KING with “kinship” option, it finds a lot of relatives. But I need KING with –related option. With IBD2 and IBD1. And here I get 0 first degree and 0 second degree relatives (still some MZ pairs).
Which basically says that I can’t infer IBD-segments and it is (I think) due to QC failed samples.
Is there a procedure for an automated QC here? Or I need to make a PCA and do “remove outliers – build PCA again – remove outliers – iterate until no outliers” procedure? Is there any other reason why KING may behave so nasty with me?
Some data to give an idea (toy dataset of 3K exomes):
King with –related:
Source MZ PO FS 2nd 3rd OTHER
===========================================================
Pedigree 0 0 0 0 0 5512860
Inference 30 0 0 3 8 5512819
King with –kinship:
Source MZ PO FS 2nd 3rd OTHER
===========================================================
Pedigree 0 0 0 0 0 5512860
Inference 30 58 463 19 895 5511395
I was able to perform relatedness inference with 10K dataset (subset of this one) 1 year ago. I have no idea what is different now (except now no one filtered QC failed samples) – I simply execute the same makefile.