Tag: HaplotypeCaller

Scatter Gather principle by chromosome on Gatk

Scatter Gather principle by chromosome on Gatk 0 Hi all, On a quest to optimize gatk pipeline, I met scatter gather principle, so I did following, pids= for chr in chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20…

Continue Reading Scatter Gather principle by chromosome on Gatk

Joint variant calling on DeepVariant GVCFs using GATK GenotypeGVCFs

Joint variant calling on DeepVariant GVCFs using GATK GenotypeGVCFs 0 Hi everyone I have a bunch of GVCF files generated by DeepVariant, but I want to use GATK’s GenotypeGVCFs for joint variant calling on them (I don’t want to use GLnexus). But GATK requires a genotype likelihood field produced by…

Continue Reading Joint variant calling on DeepVariant GVCFs using GATK GenotypeGVCFs

using gatk haplotypecaller for variants extraction

using gatk haplotypecaller for variants extraction 0 Hi, I have rna-sequenced data from covid patients. I am using hisat2 for aligning the reads to reference. So, the resulted bam files after indexing are now ready. I would like to use gatk happlotypecaller for extracting variants from my bam files. First,…

Continue Reading using gatk haplotypecaller for variants extraction

Genomic architecture of adaptive radiation and hybridization in Alpine whitefish

Sampling the radiation To understand the phylogenetic relationships between Alpine whitefish, we carried out whole-genome resequencing on 96 previously collected whitefish (with associated phenotypic measurements including standard length and gill-raker counts; collected in accordance with permits issued by the cantons of Zurich (ZH128/15), Bern (BE68/15), and Lucerne (LU04/14); these fish…

Continue Reading Genomic architecture of adaptive radiation and hybridization in Alpine whitefish

Standalone GATK HaplotypeCaller : bioinformatics

Hello! I’m hoping someone can direct me to resources around acquiring or building standalone gatk tools, specifically HaplotypeCaller. All of my research has led to the monolithic gatk wrapper (either local, spark, or in docker). The big tool is brilliant and I’ve been using it thus far, but it’s pretty…

Continue Reading Standalone GATK HaplotypeCaller : bioinformatics

Hard filtering on GATK HaplotypeCaller giving multiple warnings

I’m using this pipeline for deriving variants from RNA sequencing data: github.com/modupeore/VAP which uses specific versions of various tools, including HaplotypeCaller from GATK (v3.8-0-ge9d806836). The final step is a set of hard filters on the called variants (applied using VariantFilter), but looking at the log files, there are a lot…

Continue Reading Hard filtering on GATK HaplotypeCaller giving multiple warnings

snp – Reference variant detected as altered one in bam file

I received (from manufacturer) several .bam files and I used four callers (samtools, freebayes, haplotypecaller, deepvariant) to find some sequence variants. In obtained .vcf files, I took a closer look to some calls. I found interesting, homozygous one rs477033 (C/G Ref/Alt) with flag ‘COMMON=0’ and very low MAF. I also…

Continue Reading snp – Reference variant detected as altered one in bam file

how to extract unique variants from GVCF

how to extract unique variants from GVCF 1 [note: cross-posted on GATK forum – still awaiting a response] I have a GVCF (generated using GATK’s HaplotypeCaller w/ -ERC GVCF parameter) of 36 related samples and would like to determine the (potentially de novo) variants that are unique to each sample….

Continue Reading how to extract unique variants from GVCF

Variant quality and filters on GATK HaplotypeCaller generated VCFs

Variant quality and filters on GATK HaplotypeCaller generated VCFs 0 Hi, I am analysing human WGS data to diagnose rare inherited diseases. I followed the GATK Best Practices Guidelines for “Germline short variants discovery” for single-sample data to generate a VCF using HaplotypeCaller. The guidelines then point to the use…

Continue Reading Variant quality and filters on GATK HaplotypeCaller generated VCFs

java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

I can’t seem to get GATK to recognise the number of available threads. I am running GATK (4.2.4.1) in a conda environment which is part of a nextflow (v20.10.0) pipeline I’m writing. For whatever reason, I cannot get GATK to see there is more than one thread. I’ve tried different…

Continue Reading java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

GATK HaplotypeCaller with interval list

I am trying to use the -L option of GATK HaplotypeCaller to call SNPs and short InDels with in an interval list. My interval list file (top8snp.interval_list) content is as follows: 12 33029845 33030845 + rs24767598 13 40586682 40587682 + rs24748362 18 24373857 24374857 + rs8856159 21 50381146 50382146 +…

Continue Reading GATK HaplotypeCaller with interval list

variant – Error running gatk HaplotypeCaller with allele specific annotations

I’ve got HaplotypeCaller working nicely in standard mode, like so: # Run haplotypcaller gatk –java-options “-Xmx4g” HaplotypeCaller –intervals “$INTERVALS” -R “$REF” -I “$OUT”/results/alignment/${SN}_sorted_marked_recalibrated.bam -O “$OUT”/results/variants/${SN}_g.vcf.gz -ERC GVCF But when I try in allele-specific mode, I get the following error. All I’ve done is add the -G annotations at the end,…

Continue Reading variant – Error running gatk HaplotypeCaller with allele specific annotations

Do VQSR for HaplotypeCaller calls – Sarek

Expected Behavior Filter the calls from HaplotypeCaller with Variant Quality Score Recalibration according to GATK best practise (Tools VariantRecalibrator, ApplyRecalibration, see gatkforums.broadinstitute.org/gatk/discussion/39/variant-quality-score-recalibration-vqsr or a more recent version) Current Behavior Variant quality score recalibration currently not included. Asked Jan 26 ’18 at 08:25 malinlarsson 1 Answer: Keep in mind, that you’d…

Continue Reading Do VQSR for HaplotypeCaller calls – Sarek

Running samtools view on bam affects the number of variants called by both haplotypecaller and deepvariant – C samtools

Thanks for getting back to me Valeriu. As you suggested, I used the latest commit from the develop branch in my pipeline, and the results look good. I was able to replicate the numbers from samtools v1.10.2 and v1.11 for both variant callers. FYI $ docker run scilifelabram/htslib:dev_proper /opt/samtools/samtools version…

Continue Reading Running samtools view on bam affects the number of variants called by both haplotypecaller and deepvariant – C samtools

GATK GenotypeGVCFs changes HET to REF_ALT

Dear all, I’ve been using GATK HaplotypeCaller / GenotypGVFs (v4.2.3.0) for a while but, recently found something strange. There is a position (7063) with 8 reads (3T + 5A) that, even though HaplotyCaller calls as a HET (see image, lower track): NC_046966.1 7063 . T A,<NON_REF> 177.64 . BaseQRankSum=0.887;DP=8;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=2.369;RAW_MQandDP=16885,8;ReadPosRankSum=1.345 GT:AD:DP:GQ:PL:SB…

Continue Reading GATK GenotypeGVCFs changes HET to REF_ALT

Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

This blog post was contributed by Ankit Sethia, PhD, and Timothy Harkins, PhD, at NVIDIA Parabricks, and Olivia Choudhury, PhD,  Sujaya Srinivasan, and Aniket Deshpande at AWS. This blog provides an overview of NVIDIA’s Clara Parabricks along with a guide on how to use Parabricks within the AWS Marketplace. It…

Continue Reading Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

I’ve got sequencing data for a small 500 bp amplicon from a few samples. GATK best principles suggest running VariantRecalibrator on the GVCF files I generate. I’m trying to get this working, but I get an error about “Found annotations with zero variances”. Reading the gatk manual and other posts…

Continue Reading Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

Large-scale genome-wide study reveals climate adaptive variability in a cosmopolitan pest

Genomic data The foundational resource for this study was a dataset of 40,107,925 nuclear SNPs sequenced from a worldwide sample of 532 DBM individuals collected in 114 different sites based on our previous project15. DNA was extracted from each of the 532 individuals using DNeasy Blood and Tissue Kit (Qiagen,…

Continue Reading Large-scale genome-wide study reveals climate adaptive variability in a cosmopolitan pest

Why invariant blocks in GATK consistently have very low quality scores (but not variant sites)

I am using the latest GATK 4.1.2.0 to do variant calling on insect samples with a reference genome of a closely related species. The heterozygosity is approximately 0.02. I followed the standard pipeline of “HaplotypeCaller –> GenomicDBImport –> GenotypeGVCFs” to get my unfiltered VCFs, however, although my variant sites have…

Continue Reading Why invariant blocks in GATK consistently have very low quality scores (but not variant sites)

No quality in non-variant sites GATK

No quality in non-variant sites GATK 1 Heys, I am doing the SNP calling with Haplotypecaller BP_Resolution, CombineGVCFs with convert-to-base-pair-resolution and GenotypeGVCFs with include-non-variant-sites with GATK and when I get my vcf file, the non-variant sites does not have any quality at all: #CHROM POS ID REF ALT QUAL FILTER…

Continue Reading No quality in non-variant sites GATK

Parallel genomic responses to historical climate change and high elevation in East Asian songbirds

Extreme environments present profound physiological stress. The adaptation of closely related species to these environments is likely to invoke congruent genetic responses resulting in similar physiological and/or morphological adaptations, a process termed “parallel evolution” (1). Existing evidence shows that parallel evolution is more common at the phenotypic level than at…

Continue Reading Parallel genomic responses to historical climate change and high elevation in East Asian songbirds

Germline variant calling pipeline using Snakemake

Tool:Germline variant calling pipeline using Snakemake 0 Hello everybody, as part of a project, I had to write an in-house pipeline to call germline mutations for ~100 patients. For that I used Snakemake and GATKs best practice guidelines. Steps that take a long time (HaplotypeCaller or BaseQualityScoreRecalibration) are automatically parallelized…

Continue Reading Germline variant calling pipeline using Snakemake

Pararellization in GATK 4

Pararellization in GATK 4 4 Hi all, I’m trying (and failing) to multi-thread HaplotypeCaller in GATK 4. I read in a few places online that multi-threading in GATK 4 has been made more tricky, maybe even unfeasible, but all the places where I read that seem to be more than…

Continue Reading Pararellization in GATK 4

GATK HaplotypeCaller – Shutting down engine

00:32:48.224 INFO  HaplotypeCaller – Shutting down engine [September 17, 2021 12:32:48 AM CST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.04 minutes. Runtime.totalMemory()=2398617600 java.nio.BufferUnderflowException         at java.nio.ByteBuffer.get(ByteBuffer.java:688)         at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:285)         at java.nio.ByteBuffer.get(ByteBuffer.java:715)         at htsjdk.samtools.MemoryMappedFileBuffer.readBytes(MemoryMappedFileBuffer.java:34)         at…

Continue Reading GATK HaplotypeCaller – Shutting down engine

missing genotype ./. even with many reads under AD and DP

missing genotype ./. even with many reads under AD and DP 0 Hi All, I am trying to troubleshoot all the missing genotypes in my VCF. I don’t quite understand why I get missing genotypes (./.) when there are plenty of reads under AD and DP. I think it’s because…

Continue Reading missing genotype ./. even with many reads under AD and DP

HaplotypeCaller Memory Optimization

HaplotypeCaller Memory Optimization 0 When using HaplotypeCaller on GATK, is there a fixed amount of memory that works well for for the java -Xmx input, or does it scale with the size of the input bam? eg if I have a 50 GB file do I need to set -Xmx…

Continue Reading HaplotypeCaller Memory Optimization

ABRF Study Benchmarks NGS Platforms on Human, Microbial Samples, Provides Peek at Genapsys Data

NEW YORK – The results of a major, core facilities-driven benchmarking study for next-generation sequencing platforms are in, and just about every major player in the field can claim a victory of some sort. The data support longstanding advantages touted by market leader Illumina, while also providing a sneak peak…

Continue Reading ABRF Study Benchmarks NGS Platforms on Human, Microbial Samples, Provides Peek at Genapsys Data

Speeding up HaplotypeCaller analysis

Speeding up HaplotypeCaller analysis 0 how can I speed up the HaplotypeCaller command running? input bam file is about 16G and running time using the below command is about 15 hours. java -Xmx64G -jar GenomeAnalysisTK.jar -nt 1 -nct 34 -T HaplotypeCaller -R Renamed.fasta -I realigned.bam -o raw_variants.g.vcf.gz -ERC GVCF GATK…

Continue Reading Speeding up HaplotypeCaller analysis

Use of GenotypeGVCFs in population genetic studies

Use of GenotypeGVCFs in population genetic studies 0 I have 16 whole genome sequenced samples from two populations (8 for each population). My goal is detection of signature of selection and introgression. I performed read cleaning, mapping to reference, mark duplication. SNP calling was performed using HaplotypeCaller in GATK for…

Continue Reading Use of GenotypeGVCFs in population genetic studies

How to pass custom software specific variables to nf-core/sarek nextflow pipeline?

How to pass custom software specific variables to nf-core/sarek nextflow pipeline? 0 I’m attempting to call whole genome variants using nf-core/sarek nextflow pipeline. In QC step there is an option that invokes trim_galore quality trimming, but i don’t know how to pass my custom adapters to be cut as well….

Continue Reading How to pass custom software specific variables to nf-core/sarek nextflow pipeline?

How to filter GATK vcf file using other programs

How to filter GATK vcf file using other programs 0 hi everyone I called variants for a WGS project using GATK (HaplotypeCaller). Now, when I want to filter that VCF file by VariantFiltration command in GATK, so the following error message appears. java.lang.NumberFormatException: For input string: “10.90” I asked my…

Continue Reading How to filter GATK vcf file using other programs

gatk, ref and alt percentages .

gatk, ref and alt percentages . 0 Hello everyone, I need some info regarding how to get percentage of REF and ALT nucleotide sequence in my data. I am using gatk and currently not getting REF and ALT percentages . the command i am using for the gatk vcf file…

Continue Reading gatk, ref and alt percentages .

Consolidate gVCF calling

Hi. I am running genotyping with HaplotypeCaller and GenotypeGVCFs. After that, in the genotype information for some samples in my vcf I found some calls containing multiple genotypes (e.g. 0|0:8,0:11:99:0|1:10777_AGGCGCGGAGG_A:102,126,462:). What could be the issue? Thank you! Here is the full line: chr10 10787 . G GGGCGCGCAGCGCCGGCGCA 356.99 PASS AC=1;AF=0.014;AN=18;BaseQRankSum=-1.762;DP=4023;Ex…

Continue Reading Consolidate gVCF calling

no positional argument is defined for this tool.

A USER ERROR has occurred: no positional argument is defined for this tool. 0 Hello, hope all are doing well. I am running the HaplotypeCaller command to generate the variant file by giving multiple input bam files in a single command. python3 gatk –java-options -Xmx7g HaplotypeCaller –reference ref.fasta –input file1.bam…

Continue Reading no positional argument is defined for this tool.

Calling variants on reads with MAPQ=0 on HaplotypeCaller or bcftools mpileup

Calling variants on reads with MAPQ=0 on HaplotypeCaller or bcftools mpileup 2 I am working with about 500 samples of human exome data. used hg19 to align my reads and ran a standard best-practices GATK workflow. Later only to realise that a small 1Mb loci has not mapped properly due…

Continue Reading Calling variants on reads with MAPQ=0 on HaplotypeCaller or bcftools mpileup

Error when Phasing with Beagle 5.2

Error when Phasing with Beagle 5.2 0 I’m having trouble phasing a multi-sample (9-samples) vcf file produced by gatk HaplotypeCaller with Beagle 5.2. I do not have a genetic map or reference panel. I am working with a very heterozygous group of organisms (sea urchins). When I run beagle with…

Continue Reading Error when Phasing with Beagle 5.2

So many variants detected.

So many variants detected. 0 Dear All, I have done variant calling in Germline data that has single sample of each individual and two genes. I did following steps, but after checking results I found too many variants. After Haplotypecaller (the step 6) I found 140900 known variants, and the…

Continue Reading So many variants detected.