Output per variant and per sample heterozygosity fraction from VCF.
As a QC measure I would like to know the per variant and per sample heterozygosity fraction.
I already used vcftools to output the missingness per variant and sample.
vcftools.github.io/man_latest.html
Is there any tool that can do the same for heterozygosity?
• 3.0k views
Here is a VCFLIB solution:
./genotypeSummary -f ../samples/1kg-phaseIII-v5a.20130502.genotypes.chr22-16-16.5mb.vcf.gz -y GT -t 0,1,2,3,4
Here is the output:
sample-id n-nocall n-hom-ref n-het n-hom-alt
HG00096 0 5899 133 49
HG00097 0 5855 149 77
HG00099 0 5909 130 42
HG00100 0 5864 167 50
HG00101 0 5884 132 65
using bioalcidae : github.com/lindenb/jvarkit/wiki/BioAlcidae
var sample2count={};
var samples = header.getSampleNamesInOrder();
for(var i=0;i< samples.size();i++) {
sample2count[samples.get(i)]={
"nocall":0,
"homref":0,
"homvar":0,
"hetnonref":0,
"het":0,
};
}
while(iter.hasNext()) {
var ctx=iter.next();
for(var i=0;i< samples.size();i++) {
var count = sample2count[samples.get(i)];
var g = ctx.getGenotype(samples.get(i));
if( !g.isCalled()) count["nocall"]++;
else if( g.isHomRef()) count["homref"]++;
else if( g.isHomVar()) count["homvar"]++;
else if( g.isHetNonRef()) count["hetnonref"]++;
else if( g.isHet()) count["het"]++;
}
}
out.println("Sample nocall homref homvar hetnonref het");
for(var i=0;i< samples.size();i++)
{
var count = sample2count[samples.get(i)];
out.println(samples.get(i)+" "+ count["nocall"]+" "+count["homref"]+" "+ count["homvar"]+" "+count["hetnonref"]+" "+count["het"]);
}
run:
curl -s "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" | gunzip -c |
head -n 1000 |
java -jar ~/src/jvarkit-git/dist/bioalcidae.jar -f script.js -F vcf
Sample nocall homref homvar hetnonref het
HG00096 0 729 4 0 14
HG00097 0 712 7 0 28
HG00099 0 715 2 0 30
HG00100 0 711 13 0 23
HG00101 0 708 16 0 23
HG00102 0 714 7 0 26
HG00103 0 720 10 0 17
HG00105 0 720 2 0 25
HG00106 0 704 12 0 31
HG00107 0 707 14 0 26
HG00108 0 719 17 0 11
HG00109 0 707 18 0 22
HG00110 0 726 10 0 11
HG00111 0 724 14 0 9
HG00112 0 712 18 0 17
HG00113 0 729 3 0 15
HG00114 0 724 6 0 17
HG00115 0 713 22 0 12
HG00116 0 718 9 0 20
Traffic: 2283 users visited in the last hour
Read more here: Source link