How are two alleles typically represented in a whole genome sequence?
I apologize in advance if this is a silly question – but I am trying to understand how two inherited variants of a gene are represented in typical whole genome sequencing formats (VCF, FASTA/Q).
Here is one example illustrating my confusion, using an SNP VCF from a WGS. Take TAS2R38.. GRCh37.p13 puts the gene’s reference location at chr7:g.141672431 – chr7:g.141673573.
Using bcftools, if I call:
bcftools view genome.filtered.snp.vcf.gz 7:141673345-141673345
#CHROM POS ID REF ALT QUAL FILTER INFO
7 141673345 . C G 1434.3 PASS
How could I view the SNP, if any, from the other copy of the gene? And which allele am I viewing when I do the above?
• 22 views
Read more here: Source link