deepvariant is very slow on alignements produced by hisat2 and reports only half of expected SNPs/INDELs

Hi @lpryszcz

With respect to your aligner question, we have evaluated DeepVariant with BWA-MEM/BWA-MEM2, minimap2, DRAGEN, and the graph mapper Giraffe.

For mapping to a linear reference, we would recommend BWA-MEM/BWA-MEM2 or DRAGEN. DeepVariant is trained on BWA MEM data. Minimap2 will work, but has slightly lower accuracy.

We have not evaluated HISAT2. If I had to guess, I would suspect that there might be split read mapping which is creating more candidates. The difference in speed that you encounter suggests that this has something to do with our candidate generation logic, as opposed to something about the neural network. This would probably require us to look at what in HISAT2 is causing an edge case with our candidate generation logic. It’s possible there is a flag in HISAT2 that could be altered, but failing that it would probably take us some time to prioritize supporting HISAT2.

We typically find that marking duplicates is not necessary, but it does not have a negative effect on accuracy to do so for typical coverages (25x-50x).

Thank you,
Andrew

Read more here: Source link