I have a graph that I created from a group of haploid assemblies using PGGB. What I would like to do is use VG to align short reads from new samples to the graph using giraffe, and then genotype the samples. My end goal is to have the genotype calls be consistent with the haplotypes used to construct the graph (originally existed as W-lines in the PGGB GFA, and later as GBWT threads). I have included below the workflow that I thought would work which constructs the graph from GFA, uses vg deconstruct
to create a VCF with the genotypes of all the GBWT threads, then uses vg pack
and vg call
to genotype a .gam
based on that VCF. Unfortunately, the final command using vg call
gives me the error
[VCFTraversalFinder] Warning: No alt path (prefix=_alt_6162647cdc0e686937be7380951cff5969dced64_) found in graph for variant. It will be ignored:
for every variant. From the documentation I can see that this is caused by the fact that the graph was not constructed from a FASTA + VCF with the -a
flag, so there are no alt paths in the graph.
As solutions, I attempted to use vg call
with the -a
or -g
flags, which both worked to genotype the snarls in the graph. However, different samples produced different numbers of calls, and the POS/REF/ALT determination at each snarl was not completely consistent with the VCF created using vg deconstruct
making it difficult to harmonize the calls.
Is there any workflow that I could use to provide a VCF to vg call
to allow for harmonized calls, given that I started with a graph constructed from GFA? Such as some way to turn the GBWT threads into alt paths?
#Starting from PGGB GFA output
#Reference path stays as a P-line
#P-lines for haplotypes modified to match (sample)_(contig)_(haplotype) for conversion to W-lines
vg convert
-g graph_plines.gfa
-f
-w _
-t 12 > graph.gfa
#Create indexes from GFA
vg autoindex
-w giraffe
-g graph.gfa
-t 12
-V 1
-T /data/vg_temp
#Extract GBWT and GBWTGraph from GBZ for later user with deconstruct
vg gbwt
-o graph.gbwt
-g graph.gg
-Z graph.giraffe.gbz
#Create XG from GBZ for later use with deconstruct/pack/call
vg convert
-Z graph.giraffe.gbz
-x > graph.xg
#Create a VCF containing genotypes of all haplotypes
vg deconstruct graph.xg
-p PGF
-g graph.gbwt
-a
-t 12 > graph.vcf
#Align short reads using giraffe
vg giraffe
-Z graph.giraffe.gbz
-m graph.min
-d graph.dist
-p
-f sample.1.fq
-f sample.2.fq > sample.gam
#Create packed read support, ignoring reads with MAPQ < 5 (-Q 5)
vg pack
-x graph.xg
-o graph.pack
-g sample.gam
-Q 5
#Genotype variants in graph VCF
vg call graph.xg
-p PGF
-k sample.pack
-v graph.vcf
-s sample
-t 12 > sample.vcf
Read more here: Source link