Tag: HaplotypeCaller

A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing

Introduction Short-read metagenomic sequencing is the technique most widely used to explore the natural habitat of millions of bacteria. In comparison with 16S rRNA sequencing, shotgun metagenomic sequencing (MGS) provides sequence information of the whole genomes, which can be used to identify different genes present in an individual bacterium and…

Continue Reading A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing

Variant calling using HaplotypeCaller does not show #FILTER information

Variant calling using HaplotypeCaller does not show #FILTER information 0 Hi All, I would like to ask for Variant Calling using HaplotypeCaller. It’s supposed that after doing the HaplotypeCaller, the #FILTER columns in gvcf files shall show the ‘PASS/LowQ’ however in my case, the output #FILTER only shows ‘.’ without…

Continue Reading Variant calling using HaplotypeCaller does not show #FILTER information

haplotypecaller – NVIDIA Docs

Run a GPU-accelerated haplotypecaller. This tool applies an accelerated GATK CollectMultipleMetrics for assessing the metrics of a BAM file, such as including alignment success, quality score distributions, GC bias, and sequencing artifacts. This functions as a ‘meta-metrics’ tool, and can run any combination of the available metrics tools in GATK…

Continue Reading haplotypecaller – NVIDIA Docs

Variant missing in WGS sample

Variant missing in WGS sample 1 Hi, I have processed a WGS sample including alignment (bwa-mem2), variant calling (GATK HaplotypeCaller) and annotation (ANNOVAR). In the annotated file, a variant fitting the phenotype was identified. However, on visualizing the bam in IGV, this variant was not there. What could be the…

Continue Reading Variant missing in WGS sample

H101 for cervical cancer | DDDT

Introduction Patients with persistent, recurrent, or metastatic (P/R/M) cervical carcinoma respond poorly to treatment despite the best available therapeutic regimens, with a 5-year survival of 17%.1 Most of them are heavily pretreated with chemotherapy and/or radiotherapy, and many patients experience complications related to treatment or advanced disease, which exclude them…

Continue Reading H101 for cervical cancer | DDDT

Primate-specific ZNF808 is essential for pancreatic development in humans

Subjects The study was conducted in accordance with the Declaration of Helsinki and all subjects or their parents/guardian gave informed written consent for genetic testing. DNA testing and storage in the Beta Cell Research Bank was approved by the Wales Research Ethics Committee 5 Bangor (REC 17/WA/0327, IRAS project ID…

Continue Reading Primate-specific ZNF808 is essential for pancreatic development in humans

Samtools index not working in Snakemake

I am setting up a Snakemake pipeline for sequencing reads alignment and variants calling. But the samtools index rule is not activated, and the subsequent haplotype caller rule fail. I think it is because the samtools index rule is not perceived as necessary to execute the output of rule all…

Continue Reading Samtools index not working in Snakemake

variant calling – How to run a GATK Docker Image with local files?

I’m trying to use the HaplotypeCaller from the GATK toolkit but I keep getting an error. I pulled GATK through Docker and am using this command: docker run -v /Users/rimo/ -it broadinstitute/gatk:latest gatk HaplotypeCaller -R /Users/rimo/reference.fasta -I /Users/rimo/sample1.bam -O /Users/rimo/sample1.g.vcf.gz -ERC GVCF /Users/rimo is my home directory it’s where the…

Continue Reading variant calling – How to run a GATK Docker Image with local files?

Invasive Californian death caps develop mushrooms unisexually and bisexually

Mushroom collecting Sporocarps were collected from various herbaria and during three expeditions to Point Reyes National Seashore (PRNS), California in 2004, 2014 and 2015, and in 2015 from three sites in Portugal. A total of 86 sporocarps were collected: 67 Californian sporocarps (one early herbarium sample dates to 1993), 11…

Continue Reading Invasive Californian death caps develop mushrooms unisexually and bisexually

[Question]: What does htvc stand for in haplotypecaller – Parabricks 4.2.0-1 – Parabricks

Hi This is with reference to Parabricks 4.2.0-1 – nvcr.io/nvidia/clara/clara-parabricks:4.2.0-1 I want to know what htvc means in haplotypecaller and what does the associated binary at /usr/local/parabricks/binaries//bin/htvc do within the program. I tried searching for documentation around this but could not find any useful information /usr/local/parabricks/run_pb.py haplotypecaller <…..snipped…..> –verbose –x3…

Continue Reading [Question]: What does htvc stand for in haplotypecaller – Parabricks 4.2.0-1 – Parabricks

GenotypeGVCF too many genotypes from pooled samples

Hello, I am trying to create a VCF file using GentypeGVCFs in GATK4. I have 60 samples and each sample is pooled data. The ploidy per sample is 60. This is due to the biological system I work in. This data has been processed in Haplotypecaller, below is an example…

Continue Reading GenotypeGVCF too many genotypes from pooled samples

Allele specific binding of histone modifications and a transcription factor does not predict allele specific expression in correlated ChIP-seq peak-exon pairs

ChIP-seq and RNA-seq Tissue sampling and RNA-sequencing for three Holstein dairy cows and two of their foetuses (one male and one female with a shared sire) are described in17 and18. ChIP-sequencing for all tissues was as described in16, with the inclusion of more tissues. Whole genome sequence for each animal…

Continue Reading Allele specific binding of histone modifications and a transcription factor does not predict allele specific expression in correlated ChIP-seq peak-exon pairs

sarek: Introduction

Introduction nf-core/sarek is a workflow designed to detect variants on whole genome or targeted sequencing data. Initially designed for Human, and Mouse, it can work on any species with a reference genome. Sarek can also handle tumour / normal pairs and could include additional relapses. The pipeline is built using…

Continue Reading sarek: Introduction

The genomic footprint of whaling and isolation in fin whale populations

Samples and sequencing Tissue samples from 50 fin whales (Balaenoptera physalus) were collected using a standard protocol to obtain skin biopsies from free-ranging cetacean species, which use a small stainless-steel biopsy dart deployed from a crossbow or rifle73,74. These samples were collected throughout the Eastern North Pacific (ENP; N = 30, represented…

Continue Reading The genomic footprint of whaling and isolation in fin whale populations

Not all variants are annotated with AF

Forum:Not all variants are annotated with AF – expected or a problem? 0 I tried to use built-in databases and build my own (snpEff), however, in both cases, not all my variants are annotated with Allele Frequency (AF). The problem is: those variants not annotated in VCF has Alelle Frequency…

Continue Reading Not all variants are annotated with AF

The Biostar Herald for Thursday, August 24, 2023

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan…

Continue Reading The Biostar Herald for Thursday, August 24, 2023

The GATK “The given bam input has no sample names.” error

The GATK “The given bam input has no sample names.” error 1 for f in MINIMAP BWA ; do ~/gatk-4.2.0.0/gatk HaplotypeCaller –reference /home/tmichel/projects/rbge/HybSeq_thibauld/reference_genomes/Begonia_loranthoides_scaffold.fasta –input Hillebrandia_sorted.$f.bam –output Hillebrandia.$f.g.vcf.gz –emit-ref-confidence GVCF ; done I have used GATK to call variants in BAM files files with both minimap2 and bwa mem with the…

Continue Reading The GATK “The given bam input has no sample names.” error

Exome sequencing identifies breast cancer susceptibility genes and defines the contribution of coding variants to breast cancer risk

UKB The UKB is a population-based prospective cohort study of more than 500,000 subjects. More detailed information on the UKB is given elsewhere34,35. The study received ethics approval from the North West Multi-center Research Ethics Committee. All participants signed written informed consent before participating. WES data for 450,000 subjects were…

Continue Reading Exome sequencing identifies breast cancer susceptibility genes and defines the contribution of coding variants to breast cancer risk

Nuclear genetic control of mtDNA copy number and heteroplasmy in humans

Overview of mtSwirl Here we develop mtSwirl, a scalable pipeline for mtCN and variant calling which makes calls relative to an internally generated per-sample consensus sequence before mapping all calls back to GRCh38. In addition to GRCh38 reference files and WGS data, the mtSwirl pipeline takes as input nuclear genome…

Continue Reading Nuclear genetic control of mtDNA copy number and heteroplasmy in humans

Long-molecule scars of backup DNA repair in BRCA1- and BRCA2-deficient cancers

Pan-cancer WGS data sources GrCh37/hg19 BAM alignments for 2,489 primary tumour and matched normal whole-genome sequencing data were obtained as previously described18. In brief, 989 tumour–normal (T/N) pairs were obtained from The Cancer Genome Atlas (TCGA) Research Network (Genomic Data Commons at portal.gdc.cancer.gov/, accession: phs000178.v11.p8). Additional WGS data were obtained for 874 T/N pairs…

Continue Reading Long-molecule scars of backup DNA repair in BRCA1- and BRCA2-deficient cancers

Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools

. 2023 Aug 3;18(8):e0288371. doi: 10.1371/journal.pone.0288371. eCollection 2023. Affiliations Expand Affiliations 1 Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia. 2 Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia. Item in Clipboard Nofe Alganmi et al. PLoS…

Continue Reading Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools

Genome assembly of two diploid and one auto-tetraploid Cyclocarya paliurus genomes

Sample collection, library construction and sequencing Leaves of two diploid C. paliurus (PG-dip and PA-dip) and one auto-tetraploid (PA-tetra) for genome sequencing were collected from plants grown in germplasm bank of C. paliurus, which located in Baima experimental field, Nanjing, Jiangsu province, China. After collecting, tissues were immediately frozen in…

Continue Reading Genome assembly of two diploid and one auto-tetraploid Cyclocarya paliurus genomes

Liftedover vcf header/contig compatibility

I have a collaborator that has lifted over their hg19 files to hg38 using Crossmap. The first step in the workflow they need to run is a simple bcftools filter for variant quality. They are getting an unknown file type error. Are there any obvious problems with this header that…

Continue Reading Liftedover vcf header/contig compatibility

GATK memory error with Java

GATK memory error with Java 0 Hello, I have genotype by sequencing data for 400 samples. I am trying to run a SNP calling pipeline using GATK. I could manage until HaplotypeCaller command in gatk. However, when I proceed with CombineGVCFs step to combine all the 400 g.vcf files into…

Continue Reading GATK memory error with Java

.bed files from sequencing platform not containing intervals of “alt”, “random” haplotypes. How do I perform coverage and haplotype caller?

.bed files from sequencing platform not containing intervals of “alt”, “random” haplotypes. How do I perform coverage and haplotype caller? 0 Hello. I’m building my first human exome variant call pipeline, and I’m learning the basics. I encountered this issue for the first time when trying to obtain a per-base…

Continue Reading .bed files from sequencing platform not containing intervals of “alt”, “random” haplotypes. How do I perform coverage and haplotype caller?

What .bed file do I use for exome haplotype caller?

What .bed file do I use for exome haplotype caller? 0 Hi all! I’m building my first pipeline for human exome variant calling, and I’m starting to learn the basic working principles of genome/exome data analysis. Now, the HaplotypeCaller tool from GATK needs a .bed file with the regions the…

Continue Reading What .bed file do I use for exome haplotype caller?

Parabricks:4.0.0-1 Illegal instruction (core dumped) in haplotypecaller step – Parabricks

vet1 July 20, 2023, 2:28am 1 parabricks:4.0.0-1 , with nvidia/cuda:12.2.0-devel-ubuntu22.04;system: Ubuntu20.04/ 512G memory/ 1 p100 16G card/ 38T disk space/ It works perfectly on one of my workstation, but on another, encountered the errors while in ‘haplotypecaller’ step: for i in cat list; do docker run –gpus “device=0” –rm –volume…

Continue Reading Parabricks:4.0.0-1 Illegal instruction (core dumped) in haplotypecaller step – Parabricks

Re-evaluation and re-analysis of 152 research exomes five years after the initial report reveals clinically relevant changes in 18%

Cohort structure We collected sequencing data and information about age, sex, and phenotypes from 152 families (44 simplex with one, 79 multiplex with two, 24 with three, and five with four or more). The cohort characteristics are depicted in Fig. 2A (details in File S2 [12]). Most affected individuals were younger than…

Continue Reading Re-evaluation and re-analysis of 152 research exomes five years after the initial report reveals clinically relevant changes in 18%

A framework for individualized splice-switching oligonucleotide therapy

Patients The WGS and clinical data of 235 patients with A-T were provided by the Global A-T Family Data Platform of ATCP. Our access to the data was approved by the Data Access Committee of ATCP. Selected patients with A-T enrolled at the Manton Center for Orphan Disease Research under…

Continue Reading A framework for individualized splice-switching oligonucleotide therapy

An optimized GATK4 pipeline for Plasmodium falciparum whole genome sequencing variant calling and analysis | Malaria Journal

Optimization of the pipeline on monoclonal and simulated mixed infection samples Towards optimizing GATK4 for P. falciparum, the creation of an improved training “truth set” for the pipeline was key. To filter raw VCFs with a high quality truth callset, which is difficult to obtain using wet laboratory methods, a…

Continue Reading An optimized GATK4 pipeline for Plasmodium falciparum whole genome sequencing variant calling and analysis | Malaria Journal

Haplotypecaller batch mode – Parabricks

when haplotypecaller runs in batch mode, it get errors, as below singularity exec –nv clara-parabricks_4.0.1-1.sif pbrun haplotypecaller –batch –ref ref.fa –in-bam /data/bam/ –out-variants /date/gvcf/ –gvcfPlease visit NVIDIA Clara – NVIDIA Docs for detailed documentation [E::hts_hopen] Failed to open file /data/bam/[E::hts_open_format] Failed to open file “/data/bam/” : Is a directorysamtools view:…

Continue Reading Haplotypecaller batch mode – Parabricks

Other independent methods or ways to confirm potential candidate genes observed through variant calling and homozygosity analysis

Other independent methods or ways to confirm potential candidate genes observed through variant calling and homozygosity analysis 1 Hi folks, I need your invaluable insights and suggestions. I am currently working with some data that relate to recessive lethal Phenotype in an organism. In order to pinpoint the molecular basis…

Continue Reading Other independent methods or ways to confirm potential candidate genes observed through variant calling and homozygosity analysis

no output from GATK CombineGVCFs

no output from GATK CombineGVCFs 1 Hello All, I am using GATK to do SNPs calling from 140 RNAseq data. After variant calling of each sample with HaplotypeCaller, I get 140 g.vcf.gz files. Before perform the final joint genotyping through GenotypeGVCFs, I need to combine these 140 g.vcf.gz files into…

Continue Reading no output from GATK CombineGVCFs

What is the possibility of Depth (DP) being higher than the coverage

What is the possibility of Depth (DP) being higher than the coverage 0 Exome sequencing is done at 100x coverage. Germline variants were called using GATK-HaplotypeCaller. When I looked at the VCF files, there are few variants showing higher depth than 100x. Some depth (DP) are like 120, 146, 153…

Continue Reading What is the possibility of Depth (DP) being higher than the coverage

Which type of variant caller should I use in a WES normal cell line sample?

Which type of variant caller should I use in a WES normal cell line sample? 0 I have whole-exome sequencing data of an immortalised non-tumor (normal) cell line that I wish to assess for the presence/absence of APC/Wnt mutations. This is to double check that the cell line is sufficiently…

Continue Reading Which type of variant caller should I use in a WES normal cell line sample?

DanMAC5: a browser of aggregated sequence variants from 8,671 whole genome sequenced Danish individuals | BMC Genomic Data

Demographics Data from three studies were included: Dan-NICAD: 1,649 individuals with symptoms of obstructive coronary artery disease, predominantly chest pain, undergoing coronary computed tomography angiography. In total, 52% were females, the mean age was 57 years (+/- 9 SD), median coronary artery calcium score were 0 [0–82] and 24% of…

Continue Reading DanMAC5: a browser of aggregated sequence variants from 8,671 whole genome sequenced Danish individuals | BMC Genomic Data

Reconstruction of the personal information from human genome reads in gut metagenome sequencing data –

Topic participation The examine protocol was accredited by the ethics committees of Osaka College and associated medical establishments in addition to the Translational Well being Science and Know-how Institute (Faridabad). Japanese people (n = 343) for whom intestine metagenome shotgun sequencing had been carried out in earlier research had been included on…

Continue Reading Reconstruction of the personal information from human genome reads in gut metagenome sequencing data –

Paternity Testing from WGS Trio

It is definitely possible to assess paternity from whole genome sequence (WGS) data. Paternity can probably be established with as little as a few dozen or maybe hundreds of well-chosen single nucleotide polymorphisms (SNPs). If you have decent WGS data you can expect to genotype millions of SNPs. So, paternity…

Continue Reading Paternity Testing from WGS Trio

Reconstruction of the personal information from human genome reads in gut metagenome sequencing data

Subject participation The study protocol was approved by the ethics committees of Osaka University and related medical institutions as well as the Translational Health Science and Technology Institute (Faridabad). Japanese individuals (n = 343) for whom gut metagenome shotgun sequencing were performed in previous studies were included in this study46,47,48. Among these…

Continue Reading Reconstruction of the personal information from human genome reads in gut metagenome sequencing data

Chloroquine resistance evolution in Plasmodium falciparum is mediated by the putative amino acid transporter AAT1

Ethics approval and consent to participate The study was performed in accordance with the Guide for the Care and Use of Laboratory Animals of the US National Institutes of Health (NIH). The Seattle Children’s Research Institute (SCRI) has an Assurance from the Public Health Service through the Office of Laboratory…

Continue Reading Chloroquine resistance evolution in Plasmodium falciparum is mediated by the putative amino acid transporter AAT1

Low SNP Overlap with Michigan 1KG and TopMed reference panel

I extracted three samples (HG02024 – HG02026) from the 1000 Genomes Project’s 30x alignment files, employing the Genome Analysis Toolkit (GATK) best practice pipeline. This process involved performing base quality score recalibration, identifying and removing duplicate reads, utilizing the HaplotypeCaller to generate a genomic VCF (gVCF) file, and calling variants…

Continue Reading Low SNP Overlap with Michigan 1KG and TopMed reference panel

Filtering VCF files

Filtering VCF files 0 Hi, I managed to align some fastq files and got to the point of raw VCF files. Now I would like to filter them based on some filters using gatk VariantFiltration. But I’m completely stuck and overwhelmed on what to filter on. As I’m new to…

Continue Reading Filtering VCF files

HaplotypeCaller VCF depth is greater than the number of reads in bam

Hi, I call gvcf file using GATK HaplotypeCaller as following: gatk HaplotypeCaller -R my.fasta \ -I s-95.sort.noDup.bam \ -L 3R:23000000-27905053 \ -ERC GVCF \ -bamout test_s95.bamout.bam \ –native-pair-hmm-threads 28 \ -O test_s95.sort.noDup.g.vcf The above ouput gvcf reports a variant at 3R:25063300 3R 25063300 . C T,<NON_REF> 804.64 . BaseQRankSum=-2.060;DP=59;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=212400,59;ReadPosRankSum=-1.269 GT:AD:DP:GQ:PGT:PID:PL:PS:SB…

Continue Reading HaplotypeCaller VCF depth is greater than the number of reads in bam

Why the number of reads in bam generated by GATK haplotype caller are more than the bam generated after GATK baserecalibrator

As per explanation given here gatk.broadinstitute.org/hc/en-us/articles/360040096812-HaplotypeCaller#–bam-output , I noticed two categories of reads in the bam generated from GATK HaplotypeCaller. One set of reads start with HC and another set has original read name. Can Someone help me in better understanding this scenario. There are some reads (upper segment; lower…

Continue Reading Why the number of reads in bam generated by GATK haplotype caller are more than the bam generated after GATK baserecalibrator

Single duplex DNA sequencing with CODEC detects mutations with high sensitivity

Ethical approval, DNA samples and oligonucleotides All patients provided written informed consent to allow the collection of blood and/or tumor tissue and the analysis of clinical and genetic data for research purposes. The IRB of the Dana-Farber Cancer Institute and New York University Grossman School of Medicine approved these protocols….

Continue Reading Single duplex DNA sequencing with CODEC detects mutations with high sensitivity

Whole-exome sequencing in Chinese Tibetan patients with VSD

Introduction Congenital heart disease (CHD) refers to cardiovascular malformations caused by abnormal development of cardiac vessels during the fetal period, which is the most common congenital dysplasia and also the main cause of non-infectious death in newborns and infants.1 CHD includes atrial septal defect (ASD), ventricular septal defect (VSD), pulmonary…

Continue Reading Whole-exome sequencing in Chinese Tibetan patients with VSD

Gatktools

Gatktools 0 Hello guys, I run this code to get vcf file gatk HaplotypeCaller -R reference_genome.fasta -I input.bam -O variants.vcf But unfortunatelly it has given error code like this; A USER ERROR has occurred: Fasta dict file file:///Users/uguremre/gatk-4.4.0.0/reference_genome.dict for reference file:///Users/uguremre/gatk-4.4.0.0/reference_genome.fasta does not exist. Please see gatkforums.broadinstitute.org/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference for help creating…

Continue Reading Gatktools

Find Pathogenic Variants

Find Pathogenic Variants 1 Hi dear community, I don’t have any experience in variant calling, and I have to solve this problem: Using the most recent VCF file describing ClinVar variants and a bed/gff file of the coding sequence of curated RefSeq genes, write a script that outputs all the…

Continue Reading Find Pathogenic Variants

how to make a header name in a haplotyping script of gatk?

how to make a header name in a haplotyping script of gatk? 1 Hi, I want to ask how we can make the header name as per our choice in a haplotyping script of gatk because by default the header name of the output.vcf file is mentioned as sample1? here…

Continue Reading how to make a header name in a haplotyping script of gatk?

The Biostar Herald for Monday, April 10, 2023

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, Pavel, and was edited by…

Continue Reading The Biostar Herald for Monday, April 10, 2023

Invalid QUAL value on line

PLINK Error: Invalid QUAL value on line 0 Dear all, I have worked with VCF file after haplotypecaller, where QUAL=Infinity for some variants. I am trying to make PGEN format, but I have this ERROR: Error: Invalid QUAL value on line 17 of test_123-temporary.pvar.zst. The default range for QUAL is…

Continue Reading Invalid QUAL value on line

Can’t call subsampled bam file with GATK Haplotypecaller with –disable-tool-default-read-filters

I want to simulate variant calling of an ultra-low-coverage >0.005x bam file. I subsampled reads from the (HG02024) sample of the 1KG phase 3 dataset. My code in R to do so is the following (bam and reference are just path extensions, file is the inital bam file): cov_rate <-…

Continue Reading Can’t call subsampled bam file with GATK Haplotypecaller with –disable-tool-default-read-filters

How can I generate VCF file with two different assembled genome fasta?

How can I generate VCF file with two different assembled genome fasta? 0 When I generate VCF file by pair of illumina sequence fastq files with reference assembled genome fasta file, I used BWA mem to assemble illumina fragment and PICARD to remove duplicates. And finally I used gatk4 haplotypecaller…

Continue Reading How can I generate VCF file with two different assembled genome fasta?

What is the major problem with this pipeline of SNPs analysis?

First, I have several Aspergillus flavus (A kind of fungi species) illumina sequencing raw data as pair of fastq.gz file (sample1_filtered_1.fastq.gz and sample1_filtered_2.fastq.gz). And I wanted to assemble illumina fragment sequences and make SNP(single nucleotide polymorphism) analysis with the reference genome, Aspergillus flavus NRRL3357 as fasta file. At the end…

Continue Reading What is the major problem with this pipeline of SNPs analysis?

Differences in RNAseq Variant Calling and Allele Specific Expression

Differences in RNAseq Variant Calling and Allele Specific Expression 0 Dear community Using the GATK’s tools “haplotypecaller” and “ASEReadCounter” it is possible to produce a vcf file and a tabulated file for allele specific expression analysis (ASE), respectively. The vcf file contains information about the number of reads mapping to…

Continue Reading Differences in RNAseq Variant Calling and Allele Specific Expression

Accelerating Minimap2 for Accurate Long Read Alignment on GPUs

doi: 10.26502/jbb.2642-91280067. Epub 2023 Jan 20. Affiliations Expand Affiliations 1 Department of Computer Science and Engineering, University of Michigan Ann Arbor, MI 48109, USA. 2 NVIDIA Corporation, Santa Clara, CA 95051, USA. Free PMC article Item in Clipboard Harisankar Sadasivan et al. J Biotechnol Biomed. 2023. Free PMC article Show details…

Continue Reading Accelerating Minimap2 for Accurate Long Read Alignment on GPUs

Handling single-sample VCF after haplotypecaller

Handling single-sample VCF after haplotypecaller 0 Hi, I’m working on QTL mapping of diploid plants and succeeded in first GATK haplotypecaller run. Results are about 300 VCF files, each containing a single sample of the plant population. I found out that hard-filtering and BQSR are recommended after first haplotypecaller run,…

Continue Reading Handling single-sample VCF after haplotypecaller

Downsample process in ActiveRegion determination (HaplotypeCaller and Mutect2)

Downsample process in ActiveRegion determination (HaplotypeCaller and Mutect2) 0 gatk.broadinstitute.org/hc/en-us/articles/360036227652?id=4147 In the article it described the process of finding active regions. I found Downsampling step in final post-processing quite confusing. Could someone explain the reason for this step? There is a final post-processing step to clean up and trim the…

Continue Reading Downsample process in ActiveRegion determination (HaplotypeCaller and Mutect2)

How to extract phased haplotypes from GATK HaplotypeCaller

I would like to extract the physically phased haplotypes from a VCF file generated by GATK’s HaplotypeCaller on Illumina data of some isolates from different yeast (S. cerevisiae) strains. According to this FAQ: In the format field of a PGT (Pre-Implantation Genetic Testing) VCF, you may find a description similar…

Continue Reading How to extract phased haplotypes from GATK HaplotypeCaller

Generate VCF version 4.1 using GATK version 4

Generate VCF version 4.1 using GATK version 4 1 Dear all, I want to make vcf file of version 4.1 using gatk 4.0. I have tried using the command ./gatk HaplotypeCaller -R ref.fna -I sample.bam -O sample.vcf But this generated the vcf 4.2 version file. Someone please let me know…

Continue Reading Generate VCF version 4.1 using GATK version 4

HOW CAN I GENERATE VCF VERSION 4.1 USING GATK VERSION 4?

HOW CAN I GENERATE VCF VERSION 4.1 USING GATK VERSION 4? 1 Dear all, I want to make vcf file of version 4.1 using gatk 4.0. I have tried using the command ./gatk HaplotypeCaller -R ref.fna -I sample.bam -O sample.vcf But this generated the vcf 4.2 version file. Someone please…

Continue Reading HOW CAN I GENERATE VCF VERSION 4.1 USING GATK VERSION 4?

Navigating the Bioinformatics Workflow for Whole Exome Sequencing: A Step-by-Step Guide

Next-generation sequencing (NGS), which makes millions to billions of sequence reads at a fast rate, has greatly sped up genomics research. At the moment, Illumina, Ion Torrent/Life Technologies, 454/Roche, Pacific Bioscience, Nanopore, and GenapSys are all NGS platforms that can be used. They can produce reads of 100–10,000 bp in…

Continue Reading Navigating the Bioinformatics Workflow for Whole Exome Sequencing: A Step-by-Step Guide

Nine patients with KCNQ2-related neonatal seizures and functional studies of two missense variants

Patients and clinical data collection The institutional review board of the Faculty of Medicine, Chulalongkorn University approved this study (IRB No. 264/62) which follows the Declaration of Helsinki Guidelines and all subsequent amendments. Written informed consents were obtained from parents or legal guardians of the participants. From June 2016 to…

Continue Reading Nine patients with KCNQ2-related neonatal seizures and functional studies of two missense variants

GATK error of Argument –emit-ref-confidence

GATK error of Argument –emit-ref-confidence 1 I am facing this error while using the gatk/4.1.2.0. Please guide how to solve this: A USER ERROR has occurred: Argument –emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the –sample-name argument to run on a single…

Continue Reading GATK error of Argument –emit-ref-confidence

GATK showing error of reference file index

GATK showing error of reference file index 0 Hi, i am using the GATK version gatk/4.2.2.0 for HaplotypeCaller. I have been facing the reference.fa indexing issue. I tried to index the file using the following command samtools faidx ~/path/PitayaGenomic.fa #three difference formats of reference files -rw-r–r– 1 tariqr 1.3G Feb…

Continue Reading GATK showing error of reference file index

HaplotypeCaller GENOTYPE GIVEN ALLELES doesn’t genotype given alleles

Hi! I am trying to run the Gatk HaplotypeCaller (human data): ./gatk-4.1.2.0/gatk HaplotypeCaller\ –reference ref.fa \ –input file.bam \ –genotyping-mode GENOTYPE_GIVEN_ALLELES \ –alleles allele_chunk_file.vcf \ –intervals file.bed \ –output out/file.vcf After running the above command for any given sample, only ~ 3 sites are called and all of them have…

Continue Reading HaplotypeCaller GENOTYPE GIVEN ALLELES doesn’t genotype given alleles

Missing samples in the output vcf file created using GenotypeGVCFs in GATK

Missing samples in the output vcf file created using GenotypeGVCFs in GATK 0 Hi everyone I used the following method to create a VCF file with 50 samples. For each sample java -jar gatk_3.7-0/GenomeAnalysisTK.jar -T HaplotypeCaller -R Ref.fasta -I input.bam -o output.g.vcf.gz -ERC GVCF and then for all samples java…

Continue Reading Missing samples in the output vcf file created using GenotypeGVCFs in GATK

Calling zero mapping quality variant

Calling zero mapping quality variant 0 Hello Is it possible to call variant with read that has zero mapping quality at the region? I found that there is INDEL in my BAM file when I visualize in IGV but the variant is not in gVCF, I have checked the average…

Continue Reading Calling zero mapping quality variant

GATK HaplotypeCaller combine info from two BAM into one line in vcf (not divide into samples column)

Hi I run the GATK HaplotypeCaller and hope to get a file where each sample will have a column. My bam file looks like this: input_bam/SRR8859080.bam input_bam/ENCFF477JTA_new.bam This is my GATK command: allele_chunk_file=rs_coord.vcf gatk_run_line=”../bin/gatk-4.1.2.0/gatk” outfile=wgs_test_out.genotypes.vcf bam_file=wgs_test.bam.list genome_seq=”../hg38.fa” intervals=wgs_test.bed $gatk_run_line \ HaplotypeCaller\ –reference $genome_seq \ –input $bam_file \ –genotyping-mode GENOTYPE_GIVEN_ALLELES \…

Continue Reading GATK HaplotypeCaller combine info from two BAM into one line in vcf (not divide into samples column)

Apple M1 processor for bioinformactics

Apple M1 processor for bioinformactics 3 Hi everyone, I am thinking about buying a new laptop for bioinformatics. The new Macbook Pro with M1 processor looks really powerful. But does anyone know if there is any compatible issues for bioinformatic softwares with the Apple M1? Is there any potential issues?…

Continue Reading Apple M1 processor for bioinformactics

Error: ##fileformat=VCFv4.2 does not exist

Error: ##fileformat=VCFv4.2 does not exist 3 Hello everybody, I am using Pharmcat to preprocess my vcf file, and for this I am running this command python3 pharmcat_vcf_preprocessor.py -vcf NA12801.VCF But I am getting this error Error: ##fileformat=VCFv4.2 does not exist I have generated my vcf file by using gatk Haplotypecaller…

Continue Reading Error: ##fileformat=VCFv4.2 does not exist

Tool to combine Germline Variant call from different variant callers

Tool to combine Germline Variant call from different variant callers 0 Hi everyone Is there a tool to combine VCFs generated from different germline variant callers such as Lofreq, iVAR, Bcftools and haplotypecaller (maybe union or intersection) that chooses variants based on something like majority voting rule like if a…

Continue Reading Tool to combine Germline Variant call from different variant callers

SNP calling

SNP calling 0 Hello I made for 83 samples bam file a vcf file with HaplotypeCaller then filtered with VarianFiltration, after that with vcfR package in R program got “GT”. but I have many no-call (./.). I want to remove no-call . also I used of gatk HaplotypeCaller -R reference.fasta…

Continue Reading SNP calling

Cost-effective and accurate genomics analysis with Sentieon on AWS

This blog post was contributed by Don Freed, Senior Bioinformatics Scientist, and Brendan Gallagher, Head of Business Development at Sentieon; and Olivia Choudhury, PhD, Senior Partner Solutions Architect, Sujaya Srinivasan, Genomics Solutions Architect, and Aniket Deshpande, Senior Specialist, HPC HCLS at AWS. The year 2022 was an exciting one for genomics…

Continue Reading Cost-effective and accurate genomics analysis with Sentieon on AWS

Issue with VCF format while using Pharmcat

Hello everybody, I am using pharmcat tool’s prerprocessor feature to preprocessmy vcf file using the command > python3 pharmcat_vcf_preprocessor.py -vcf sample.vcf But I think there is some issue with my vcf file as this command outputs an error > Reading samples from sample.vcf … Saving output to . > >…

Continue Reading Issue with VCF format while using Pharmcat

Genetic determinants and absence of breast cancer in Xavante Indians in Sangradouro Reserve, Brazil

Ethics statement Authorization from Fundação Nacional do Índio (FUNAI) was acquired after approval from the Research Ethics Committee of the Faculty of Medicine in the Federal University of Mato Grosso (UFMT), and the National Commission of Research Ethics (authorization #1004/2001). Written consents, which were recorded and archived, were acquired from…

Continue Reading Genetic determinants and absence of breast cancer in Xavante Indians in Sangradouro Reserve, Brazil

Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment

Sample preparation We ordered the GIAB samples from the Coriell Institute (NA24385, NIST ID HG002; NA24149, NIST-ID HG003 and NA24143, NIST-ID HG004). DNA concentration was measured by Qubit. The library was constructed according to Illumina TruSeq DNA PCR Free Library Prep protocol HT (Illumina Inc., San Diego, CA, USA) for…

Continue Reading Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment

Scatter Gather principle by chromosome on Gatk

Scatter Gather principle by chromosome on Gatk 0 Hi all, On a quest to optimize gatk pipeline, I met scatter gather principle, so I did following, pids= for chr in chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20…

Continue Reading Scatter Gather principle by chromosome on Gatk

Joint variant calling on DeepVariant GVCFs using GATK GenotypeGVCFs

Joint variant calling on DeepVariant GVCFs using GATK GenotypeGVCFs 0 Hi everyone I have a bunch of GVCF files generated by DeepVariant, but I want to use GATK’s GenotypeGVCFs for joint variant calling on them (I don’t want to use GLnexus). But GATK requires a genotype likelihood field produced by…

Continue Reading Joint variant calling on DeepVariant GVCFs using GATK GenotypeGVCFs

using gatk haplotypecaller for variants extraction

using gatk haplotypecaller for variants extraction 0 Hi, I have rna-sequenced data from covid patients. I am using hisat2 for aligning the reads to reference. So, the resulted bam files after indexing are now ready. I would like to use gatk happlotypecaller for extracting variants from my bam files. First,…

Continue Reading using gatk haplotypecaller for variants extraction

Genomic architecture of adaptive radiation and hybridization in Alpine whitefish

Sampling the radiation To understand the phylogenetic relationships between Alpine whitefish, we carried out whole-genome resequencing on 96 previously collected whitefish (with associated phenotypic measurements including standard length and gill-raker counts; collected in accordance with permits issued by the cantons of Zurich (ZH128/15), Bern (BE68/15), and Lucerne (LU04/14); these fish…

Continue Reading Genomic architecture of adaptive radiation and hybridization in Alpine whitefish

Standalone GATK HaplotypeCaller : bioinformatics

Hello! I’m hoping someone can direct me to resources around acquiring or building standalone gatk tools, specifically HaplotypeCaller. All of my research has led to the monolithic gatk wrapper (either local, spark, or in docker). The big tool is brilliant and I’ve been using it thus far, but it’s pretty…

Continue Reading Standalone GATK HaplotypeCaller : bioinformatics

Hard filtering on GATK HaplotypeCaller giving multiple warnings

I’m using this pipeline for deriving variants from RNA sequencing data: github.com/modupeore/VAP which uses specific versions of various tools, including HaplotypeCaller from GATK (v3.8-0-ge9d806836). The final step is a set of hard filters on the called variants (applied using VariantFilter), but looking at the log files, there are a lot…

Continue Reading Hard filtering on GATK HaplotypeCaller giving multiple warnings

snp – Reference variant detected as altered one in bam file

I received (from manufacturer) several .bam files and I used four callers (samtools, freebayes, haplotypecaller, deepvariant) to find some sequence variants. In obtained .vcf files, I took a closer look to some calls. I found interesting, homozygous one rs477033 (C/G Ref/Alt) with flag ‘COMMON=0’ and very low MAF. I also…

Continue Reading snp – Reference variant detected as altered one in bam file

how to extract unique variants from GVCF

how to extract unique variants from GVCF 1 [note: cross-posted on GATK forum – still awaiting a response] I have a GVCF (generated using GATK’s HaplotypeCaller w/ -ERC GVCF parameter) of 36 related samples and would like to determine the (potentially de novo) variants that are unique to each sample….

Continue Reading how to extract unique variants from GVCF

Variant quality and filters on GATK HaplotypeCaller generated VCFs

Variant quality and filters on GATK HaplotypeCaller generated VCFs 0 Hi, I am analysing human WGS data to diagnose rare inherited diseases. I followed the GATK Best Practices Guidelines for “Germline short variants discovery” for single-sample data to generate a VCF using HaplotypeCaller. The guidelines then point to the use…

Continue Reading Variant quality and filters on GATK HaplotypeCaller generated VCFs

java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

I can’t seem to get GATK to recognise the number of available threads. I am running GATK (4.2.4.1) in a conda environment which is part of a nextflow (v20.10.0) pipeline I’m writing. For whatever reason, I cannot get GATK to see there is more than one thread. I’ve tried different…

Continue Reading java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

GATK HaplotypeCaller with interval list

I am trying to use the -L option of GATK HaplotypeCaller to call SNPs and short InDels with in an interval list. My interval list file (top8snp.interval_list) content is as follows: 12 33029845 33030845 + rs24767598 13 40586682 40587682 + rs24748362 18 24373857 24374857 + rs8856159 21 50381146 50382146 +…

Continue Reading GATK HaplotypeCaller with interval list

variant – Error running gatk HaplotypeCaller with allele specific annotations

I’ve got HaplotypeCaller working nicely in standard mode, like so: # Run haplotypcaller gatk –java-options “-Xmx4g” HaplotypeCaller –intervals “$INTERVALS” -R “$REF” -I “$OUT”/results/alignment/${SN}_sorted_marked_recalibrated.bam -O “$OUT”/results/variants/${SN}_g.vcf.gz -ERC GVCF But when I try in allele-specific mode, I get the following error. All I’ve done is add the -G annotations at the end,…

Continue Reading variant – Error running gatk HaplotypeCaller with allele specific annotations

Do VQSR for HaplotypeCaller calls – Sarek

Expected Behavior Filter the calls from HaplotypeCaller with Variant Quality Score Recalibration according to GATK best practise (Tools VariantRecalibrator, ApplyRecalibration, see gatkforums.broadinstitute.org/gatk/discussion/39/variant-quality-score-recalibration-vqsr or a more recent version) Current Behavior Variant quality score recalibration currently not included. Asked Jan 26 ’18 at 08:25 malinlarsson 1 Answer: Keep in mind, that you’d…

Continue Reading Do VQSR for HaplotypeCaller calls – Sarek

Running samtools view on bam affects the number of variants called by both haplotypecaller and deepvariant – C samtools

Thanks for getting back to me Valeriu. As you suggested, I used the latest commit from the develop branch in my pipeline, and the results look good. I was able to replicate the numbers from samtools v1.10.2 and v1.11 for both variant callers. FYI $ docker run scilifelabram/htslib:dev_proper /opt/samtools/samtools version…

Continue Reading Running samtools view on bam affects the number of variants called by both haplotypecaller and deepvariant – C samtools

GATK GenotypeGVCFs changes HET to REF_ALT

Dear all, I’ve been using GATK HaplotypeCaller / GenotypGVFs (v4.2.3.0) for a while but, recently found something strange. There is a position (7063) with 8 reads (3T + 5A) that, even though HaplotyCaller calls as a HET (see image, lower track): NC_046966.1 7063 . T A,<NON_REF> 177.64 . BaseQRankSum=0.887;DP=8;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=2.369;RAW_MQandDP=16885,8;ReadPosRankSum=1.345 GT:AD:DP:GQ:PL:SB…

Continue Reading GATK GenotypeGVCFs changes HET to REF_ALT

Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

This blog post was contributed by Ankit Sethia, PhD, and Timothy Harkins, PhD, at NVIDIA Parabricks, and Olivia Choudhury, PhD,  Sujaya Srinivasan, and Aniket Deshpande at AWS. This blog provides an overview of NVIDIA’s Clara Parabricks along with a guide on how to use Parabricks within the AWS Marketplace. It…

Continue Reading Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

I’ve got sequencing data for a small 500 bp amplicon from a few samples. GATK best principles suggest running VariantRecalibrator on the GVCF files I generate. I’m trying to get this working, but I get an error about “Found annotations with zero variances”. Reading the gatk manual and other posts…

Continue Reading Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

Large-scale genome-wide study reveals climate adaptive variability in a cosmopolitan pest

Genomic data The foundational resource for this study was a dataset of 40,107,925 nuclear SNPs sequenced from a worldwide sample of 532 DBM individuals collected in 114 different sites based on our previous project15. DNA was extracted from each of the 532 individuals using DNeasy Blood and Tissue Kit (Qiagen,…

Continue Reading Large-scale genome-wide study reveals climate adaptive variability in a cosmopolitan pest

Why invariant blocks in GATK consistently have very low quality scores (but not variant sites)

I am using the latest GATK 4.1.2.0 to do variant calling on insect samples with a reference genome of a closely related species. The heterozygosity is approximately 0.02. I followed the standard pipeline of “HaplotypeCaller –> GenomicDBImport –> GenotypeGVCFs” to get my unfiltered VCFs, however, although my variant sites have…

Continue Reading Why invariant blocks in GATK consistently have very low quality scores (but not variant sites)

No quality in non-variant sites GATK

No quality in non-variant sites GATK 1 Heys, I am doing the SNP calling with Haplotypecaller BP_Resolution, CombineGVCFs with convert-to-base-pair-resolution and GenotypeGVCFs with include-non-variant-sites with GATK and when I get my vcf file, the non-variant sites does not have any quality at all: #CHROM POS ID REF ALT QUAL FILTER…

Continue Reading No quality in non-variant sites GATK

Parallel genomic responses to historical climate change and high elevation in East Asian songbirds

Extreme environments present profound physiological stress. The adaptation of closely related species to these environments is likely to invoke congruent genetic responses resulting in similar physiological and/or morphological adaptations, a process termed “parallel evolution” (1). Existing evidence shows that parallel evolution is more common at the phenotypic level than at…

Continue Reading Parallel genomic responses to historical climate change and high elevation in East Asian songbirds

Germline variant calling pipeline using Snakemake

Tool:Germline variant calling pipeline using Snakemake 0 Hello everybody, as part of a project, I had to write an in-house pipeline to call germline mutations for ~100 patients. For that I used Snakemake and GATKs best practice guidelines. Steps that take a long time (HaplotypeCaller or BaseQualityScoreRecalibration) are automatically parallelized…

Continue Reading Germline variant calling pipeline using Snakemake

Pararellization in GATK 4

Pararellization in GATK 4 4 Hi all, I’m trying (and failing) to multi-thread HaplotypeCaller in GATK 4. I read in a few places online that multi-threading in GATK 4 has been made more tricky, maybe even unfeasible, but all the places where I read that seem to be more than…

Continue Reading Pararellization in GATK 4