Variant Calling Heterozygous Reference Alleles

I am going to be working with VCF files a lot in the near future so I thought I would brush up on the practice.
After much reading and research, there’s something that I just can’t wrap my head around.

1) In a diploid organism, you have 2 alleles for a particular gene. My question is, how is this captured within the reference alignment sequence when the Reference alignment sequence is “single stranded” in that it fails to capture a possible heterozygous individual for a particular gene. For example, the reference genome at a particular locus will only have 1 nucleotide present.

In the following example:

#CHROM  POS ID  REF ALT     QUAL    FILTER  INFO    FORMAT  NA12878 

20  10001019    .   T   G   364.77  .   [CLIPPED]   GT:AD:DP:GQ:PL  0/1:18,15:33:99:393,0,480

It is deemed that the sample NA12878 is heterozygous for this position in that he/she has a T/G allele at each locus in the Chromosome 20. The question from above is referring to the reference. There is only 1 base. In actuality shouldn’t there be maybe 2 alleles if the reference individual was heterozygous? If the reference individual also was heterozygous at this position and lets say he/she also had a G allele at the same locus, then shouldn’t the VCF be reported as 0/0 and there would in fact maybe be no variants at all?

Flipping this around, lets say the NA12878 individual was used as a reference. At position 10001019 in Chromosome 20, which would be the REF? Would it be the T or the G allele since the person is heterozygous for both?

Source link