Tag: HaplotypeCaller
Hard filtering on GATK HaplotypeCaller giving multiple warnings
I’m using this pipeline for deriving variants from RNA sequencing data: github.com/modupeore/VAP which uses specific versions of various tools, including HaplotypeCaller from GATK (v3.8-0-ge9d806836). The final step is a set of hard filters on the called variants (applied using VariantFilter), but looking at the log files, there are a lot…
snp – Reference variant detected as altered one in bam file
I received (from manufacturer) several .bam files and I used four callers (samtools, freebayes, haplotypecaller, deepvariant) to find some sequence variants. In obtained .vcf files, I took a closer look to some calls. I found interesting, homozygous one rs477033 (C/G Ref/Alt) with flag ‘COMMON=0’ and very low MAF. I also…
how to extract unique variants from GVCF
how to extract unique variants from GVCF 1 [note: cross-posted on GATK forum – still awaiting a response] I have a GVCF (generated using GATK’s HaplotypeCaller w/ -ERC GVCF parameter) of 36 related samples and would like to determine the (potentially de novo) variants that are unique to each sample….
Variant quality and filters on GATK HaplotypeCaller generated VCFs
Variant quality and filters on GATK HaplotypeCaller generated VCFs 0 Hi, I am analysing human WGS data to diagnose rare inherited diseases. I followed the GATK Best Practices Guidelines for “Germline short variants discovery” for single-sample data to generate a VCF using HaplotypeCaller. The guidelines then point to the use…
java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread
I can’t seem to get GATK to recognise the number of available threads. I am running GATK (4.2.4.1) in a conda environment which is part of a nextflow (v20.10.0) pipeline I’m writing. For whatever reason, I cannot get GATK to see there is more than one thread. I’ve tried different…
GATK HaplotypeCaller with interval list
I am trying to use the -L option of GATK HaplotypeCaller to call SNPs and short InDels with in an interval list. My interval list file (top8snp.interval_list) content is as follows: 12 33029845 33030845 + rs24767598 13 40586682 40587682 + rs24748362 18 24373857 24374857 + rs8856159 21 50381146 50382146 +…
variant – Error running gatk HaplotypeCaller with allele specific annotations
I’ve got HaplotypeCaller working nicely in standard mode, like so: # Run haplotypcaller gatk –java-options “-Xmx4g” HaplotypeCaller –intervals “$INTERVALS” -R “$REF” -I “$OUT”/results/alignment/${SN}_sorted_marked_recalibrated.bam -O “$OUT”/results/variants/${SN}_g.vcf.gz -ERC GVCF But when I try in allele-specific mode, I get the following error. All I’ve done is add the -G annotations at the end,…
Do VQSR for HaplotypeCaller calls – Sarek
Expected Behavior Filter the calls from HaplotypeCaller with Variant Quality Score Recalibration according to GATK best practise (Tools VariantRecalibrator, ApplyRecalibration, see gatkforums.broadinstitute.org/gatk/discussion/39/variant-quality-score-recalibration-vqsr or a more recent version) Current Behavior Variant quality score recalibration currently not included. Asked Jan 26 ’18 at 08:25 malinlarsson 1 Answer: Keep in mind, that you’d…
Running samtools view on bam affects the number of variants called by both haplotypecaller and deepvariant – C samtools
Thanks for getting back to me Valeriu. As you suggested, I used the latest commit from the develop branch in my pipeline, and the results look good. I was able to replicate the numbers from samtools v1.10.2 and v1.11 for both variant callers. FYI $ docker run scilifelabram/htslib:dev_proper /opt/samtools/samtools version…
GATK GenotypeGVCFs changes HET to REF_ALT
Dear all, I’ve been using GATK HaplotypeCaller / GenotypGVFs (v4.2.3.0) for a while but, recently found something strange. There is a position (7063) with 8 reads (3T + 5A) that, even though HaplotyCaller calls as a HET (see image, lower track): NC_046966.1 7063 . T A,<NON_REF> 177.64 . BaseQRankSum=0.887;DP=8;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=2.369;RAW_MQandDP=16885,8;ReadPosRankSum=1.345 GT:AD:DP:GQ:PL:SB…
Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS
This blog post was contributed by Ankit Sethia, PhD, and Timothy Harkins, PhD, at NVIDIA Parabricks, and Olivia Choudhury, PhD, Sujaya Srinivasan, and Aniket Deshpande at AWS. This blog provides an overview of NVIDIA’s Clara Parabricks along with a guide on how to use Parabricks within the AWS Marketplace. It…
Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample
I’ve got sequencing data for a small 500 bp amplicon from a few samples. GATK best principles suggest running VariantRecalibrator on the GVCF files I generate. I’m trying to get this working, but I get an error about “Found annotations with zero variances”. Reading the gatk manual and other posts…
Large-scale genome-wide study reveals climate adaptive variability in a cosmopolitan pest
Genomic data The foundational resource for this study was a dataset of 40,107,925 nuclear SNPs sequenced from a worldwide sample of 532 DBM individuals collected in 114 different sites based on our previous project15. DNA was extracted from each of the 532 individuals using DNeasy Blood and Tissue Kit (Qiagen,…
Why invariant blocks in GATK consistently have very low quality scores (but not variant sites)
I am using the latest GATK 4.1.2.0 to do variant calling on insect samples with a reference genome of a closely related species. The heterozygosity is approximately 0.02. I followed the standard pipeline of “HaplotypeCaller –> GenomicDBImport –> GenotypeGVCFs” to get my unfiltered VCFs, however, although my variant sites have…
No quality in non-variant sites GATK
No quality in non-variant sites GATK 1 Heys, I am doing the SNP calling with Haplotypecaller BP_Resolution, CombineGVCFs with convert-to-base-pair-resolution and GenotypeGVCFs with include-non-variant-sites with GATK and when I get my vcf file, the non-variant sites does not have any quality at all: #CHROM POS ID REF ALT QUAL FILTER…
Parallel genomic responses to historical climate change and high elevation in East Asian songbirds
Extreme environments present profound physiological stress. The adaptation of closely related species to these environments is likely to invoke congruent genetic responses resulting in similar physiological and/or morphological adaptations, a process termed “parallel evolution” (1). Existing evidence shows that parallel evolution is more common at the phenotypic level than at…
Germline variant calling pipeline using Snakemake
Tool:Germline variant calling pipeline using Snakemake 0 Hello everybody, as part of a project, I had to write an in-house pipeline to call germline mutations for ~100 patients. For that I used Snakemake and GATKs best practice guidelines. Steps that take a long time (HaplotypeCaller or BaseQualityScoreRecalibration) are automatically parallelized…
Pararellization in GATK 4
Pararellization in GATK 4 4 Hi all, I’m trying (and failing) to multi-thread HaplotypeCaller in GATK 4. I read in a few places online that multi-threading in GATK 4 has been made more tricky, maybe even unfeasible, but all the places where I read that seem to be more than…
GATK HaplotypeCaller – Shutting down engine
00:32:48.224 INFO HaplotypeCaller – Shutting down engine [September 17, 2021 12:32:48 AM CST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.04 minutes. Runtime.totalMemory()=2398617600 java.nio.BufferUnderflowException at java.nio.ByteBuffer.get(ByteBuffer.java:688) at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:285) at java.nio.ByteBuffer.get(ByteBuffer.java:715) at htsjdk.samtools.MemoryMappedFileBuffer.readBytes(MemoryMappedFileBuffer.java:34) at…
missing genotype ./. even with many reads under AD and DP
missing genotype ./. even with many reads under AD and DP 0 Hi All, I am trying to troubleshoot all the missing genotypes in my VCF. I don’t quite understand why I get missing genotypes (./.) when there are plenty of reads under AD and DP. I think it’s because…
HaplotypeCaller Memory Optimization
HaplotypeCaller Memory Optimization 0 When using HaplotypeCaller on GATK, is there a fixed amount of memory that works well for for the java -Xmx input, or does it scale with the size of the input bam? eg if I have a 50 GB file do I need to set -Xmx…
ABRF Study Benchmarks NGS Platforms on Human, Microbial Samples, Provides Peek at Genapsys Data
NEW YORK – The results of a major, core facilities-driven benchmarking study for next-generation sequencing platforms are in, and just about every major player in the field can claim a victory of some sort. The data support longstanding advantages touted by market leader Illumina, while also providing a sneak peak…
Speeding up HaplotypeCaller analysis
Speeding up HaplotypeCaller analysis 0 how can I speed up the HaplotypeCaller command running? input bam file is about 16G and running time using the below command is about 15 hours. java -Xmx64G -jar GenomeAnalysisTK.jar -nt 1 -nct 34 -T HaplotypeCaller -R Renamed.fasta -I realigned.bam -o raw_variants.g.vcf.gz -ERC GVCF GATK…
Use of GenotypeGVCFs in population genetic studies
Use of GenotypeGVCFs in population genetic studies 0 I have 16 whole genome sequenced samples from two populations (8 for each population). My goal is detection of signature of selection and introgression. I performed read cleaning, mapping to reference, mark duplication. SNP calling was performed using HaplotypeCaller in GATK for…
How to pass custom software specific variables to nf-core/sarek nextflow pipeline?
How to pass custom software specific variables to nf-core/sarek nextflow pipeline? 0 I’m attempting to call whole genome variants using nf-core/sarek nextflow pipeline. In QC step there is an option that invokes trim_galore quality trimming, but i don’t know how to pass my custom adapters to be cut as well….
How to filter GATK vcf file using other programs
How to filter GATK vcf file using other programs 0 hi everyone I called variants for a WGS project using GATK (HaplotypeCaller). Now, when I want to filter that VCF file by VariantFiltration command in GATK, so the following error message appears. java.lang.NumberFormatException: For input string: “10.90” I asked my…
gatk, ref and alt percentages .
gatk, ref and alt percentages . 0 Hello everyone, I need some info regarding how to get percentage of REF and ALT nucleotide sequence in my data. I am using gatk and currently not getting REF and ALT percentages . the command i am using for the gatk vcf file…
Consolidate gVCF calling
Hi. I am running genotyping with HaplotypeCaller and GenotypeGVCFs. After that, in the genotype information for some samples in my vcf I found some calls containing multiple genotypes (e.g. 0|0:8,0:11:99:0|1:10777_AGGCGCGGAGG_A:102,126,462:). What could be the issue? Thank you! Here is the full line: chr10 10787 . G GGGCGCGCAGCGCCGGCGCA 356.99 PASS AC=1;AF=0.014;AN=18;BaseQRankSum=-1.762;DP=4023;Ex…
no positional argument is defined for this tool.
A USER ERROR has occurred: no positional argument is defined for this tool. 0 Hello, hope all are doing well. I am running the HaplotypeCaller command to generate the variant file by giving multiple input bam files in a single command. python3 gatk –java-options -Xmx7g HaplotypeCaller –reference ref.fasta –input file1.bam…
Calling variants on reads with MAPQ=0 on HaplotypeCaller or bcftools mpileup
Calling variants on reads with MAPQ=0 on HaplotypeCaller or bcftools mpileup 2 I am working with about 500 samples of human exome data. used hg19 to align my reads and ran a standard best-practices GATK workflow. Later only to realise that a small 1Mb loci has not mapped properly due…
Error when Phasing with Beagle 5.2
Error when Phasing with Beagle 5.2 0 I’m having trouble phasing a multi-sample (9-samples) vcf file produced by gatk HaplotypeCaller with Beagle 5.2. I do not have a genetic map or reference panel. I am working with a very heterozygous group of organisms (sea urchins). When I run beagle with…
So many variants detected.
So many variants detected. 0 Dear All, I have done variant calling in Germline data that has single sample of each individual and two genes. I did following steps, but after checking results I found too many variants. After Haplotypecaller (the step 6) I found 140900 known variants, and the…