Output per variant and per sample heterozygosity fraction from VCF.

Output per variant and per sample heterozygosity fraction from VCF.

2

As a QC measure I would like to know the per variant and per sample heterozygosity fraction.

I already used vcftools to output the missingness per variant and sample.

vcftools.github.io/man_latest.html

Is there any tool that can do the same for heterozygosity?


vcf


qc

• 3.0k views

updated 2 hours ago by

▴

440

written 5.4 years ago by

★

4.9k

Here is a VCFLIB solution:

  ./genotypeSummary -f ../samples/1kg-phaseIII-v5a.20130502.genotypes.chr22-16-16.5mb.vcf.gz -y GT -t 0,1,2,3,4

Here is the output:

sample-id   n-nocall    n-hom-ref   n-het   n-hom-alt
HG00096 0   5899    133 49
HG00097 0   5855    149 77
HG00099 0   5909    130 42
HG00100 0   5864    167 50
HG00101 0   5884    132 65

using bioalcidae : github.com/lindenb/jvarkit/wiki/BioAlcidae

var sample2count={};
var samples = header.getSampleNamesInOrder();
for(var i=0;i< samples.size();i++) {
sample2count[samples.get(i)]={
    "nocall":0,
    "homref":0,
    "homvar":0,
    "hetnonref":0,
    "het":0,
    };
}

while(iter.hasNext()) {
    var ctx=iter.next();
    for(var i=0;i< samples.size();i++) {
        var count = sample2count[samples.get(i)];
        var  g = ctx.getGenotype(samples.get(i));
        if( !g.isCalled()) count["nocall"]++;
        else if( g.isHomRef()) count["homref"]++;
        else if( g.isHomVar()) count["homvar"]++;
        else if( g.isHetNonRef()) count["hetnonref"]++;
        else if( g.isHet()) count["het"]++;
        }
    }
 out.println("Sample nocall homref homvar hetnonref het");  
 for(var i=0;i< samples.size();i++)
 {
 var count = sample2count[samples.get(i)];
 out.println(samples.get(i)+" "+ count["nocall"]+" "+count["homref"]+" "+ count["homvar"]+" "+count["hetnonref"]+" "+count["het"]);
 }

run:

 curl -s "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" | gunzip -c  |
head -n 1000 | 
java -jar ~/src/jvarkit-git/dist/bioalcidae.jar -f script.js -F vcf 

Sample nocall homref homvar hetnonref het
HG00096 0 729 4 0 14
HG00097 0 712 7 0 28
HG00099 0 715 2 0 30
HG00100 0 711 13 0 23
HG00101 0 708 16 0 23
HG00102 0 714 7 0 26
HG00103 0 720 10 0 17
HG00105 0 720 2 0 25
HG00106 0 704 12 0 31
HG00107 0 707 14 0 26
HG00108 0 719 17 0 11
HG00109 0 707 18 0 22
HG00110 0 726 10 0 11
HG00111 0 724 14 0 9
HG00112 0 712 18 0 17
HG00113 0 729 3 0 15
HG00114 0 724 6 0 17
HG00115 0 713 22 0 12
HG00116 0 718 9 0 20


Login
before adding your answer.

Traffic: 2283 users visited in the last hour

Read more here: Source link