Tag: MarkduplicatesSpark

sequence alignment – MarkDuplicatesSpark failing with cryptic error message. MarkDuplicates succeeds

[*] I have been trying to follow the GATK Best Practice Workflow for ‘Data pre-processing for variant discovery’ (gatk.broadinstitute.org/hc/en-us/articles/360035535912). This has all been run on Windows Subsystem for Linux 2 on the Bash shell. I started off with FASTQ files from IGSR (www.internationalgenome.org/data-portal) and performed alignment with Bowtie2 (instead of…

Continue Reading sequence alignment – MarkDuplicatesSpark failing with cryptic error message. MarkDuplicates succeeds

MarkduplicatesSpark How to speed-up ?

MarkduplicatesSpark How to speed-up ? 0 Hello all, I would like to know if there is any good option to speed up MarkduplicatesSpark ? I work with human genome with arround 900 millions reads (151 bp). I work on a cluster (with slurm). The command that i used is (with…

Continue Reading MarkduplicatesSpark How to speed-up ?

So many variants detected.

So many variants detected. 0 Dear All, I have done variant calling in Germline data that has single sample of each individual and two genes. I did following steps, but after checking results I found too many variants. After Haplotypecaller (the step 6) I found 140900 known variants, and the…

Continue Reading So many variants detected.