how to manage reference differences

VCF from GRCH37 to GRCH38: how to manage reference differences


Imagine you have a VCF with variants annotated for the GRCH37 assembly. Then, you want to convert these variants to the GRCH38 genome. Of course, new coordinates can be obtained using liftover and ref/alt will remain the same where GRCH37 and GRCH38 sequences are equal. However, what if the sequence is different between assemblies?

For example, How would you convert from variant 16:23625463 A>T (GRCH37) to GRCH38? At the liftovered position, 16:23614142, the reference is T. Obviously, something like 16:23614142T>T does not make sense.

  • Would you convert it as 16:23614142T>A (if the variant was heterozygous)? If so, even all GRCH37-GRCH38 sequence differences where a variant was not found in the VCF, are actually a GRCH38 variant.

My suspicion is that variants cannot be converted between genome assemblies if sequence difference exists.






Read more here: Source link