Tag: GATK

Parse a file of strings in python separated by newline into a json array

I don’t see where you’re actually reading from the file in the first place. You have to actually read your path_text.txt before you can format it correctly right? with open(‘path_text.txt’,’r’,encoding=’utf-8′) as myfile: content = myfiel.read().splitlines() Which will give you [‘/gp/oi/eu/gatk/inputs/NA12878_24RG_med.hg38.bam’, ‘/gp/oi/eu/gatk/inputs/NA12878_24RG_small.hg38.bam’] in content. Now if you want to write this…

Continue Reading Parse a file of strings in python separated by newline into a json array

Hard filtering on GATK HaplotypeCaller giving multiple warnings

I’m using this pipeline for deriving variants from RNA sequencing data: github.com/modupeore/VAP which uses specific versions of various tools, including HaplotypeCaller from GATK (v3.8-0-ge9d806836). The final step is a set of hard filters on the called variants (applied using VariantFilter), but looking at the log files, there are a lot…

Continue Reading Hard filtering on GATK HaplotypeCaller giving multiple warnings

Bioinformatics Analyst II – Remote in Danville, PA for Geisinger

Details Posted: 22-Apr-22 Location: Danville, Pennsylvania Type: Full Time Salary: Open Categories: Operations Job Summary Primary accountability is to leverage the organization’s data assets exome sequencing data (>180,000 individuals) from MyCode Community Health Initiative to improve quality, efficiency and generate knowledge specifically in the field of bioinformatics within health research….

Continue Reading Bioinformatics Analyst II – Remote in Danville, PA for Geisinger

Bioinformatics Scientist for Whole Genome and Whole Exome Sequencing

** Bioinformatics Scientist for Whole Genome and Whole Exome Sequencing ** The NeuroGenomics and Informatics (NGI) Center lead by Dr. Carlos Cruchaga at Washington University School of Medicine is recruiting a Bioinformatics Scientist to work on Whole Genome and Whole Exome Sequencing. We are seeking an experienced, self-motivated, self-driven scientist…

Continue Reading Bioinformatics Scientist for Whole Genome and Whole Exome Sequencing

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

Sequencing data We used publicly available sequencing data from the GIAB consortium45, 1000 Genomes Project high-coverage data46 and Human Genome Structural Variation Consortium (HGSVC)4. All datasets include only samples consented for public dissemination of the full genomes. Statistics and reproducibility For generating the assemblies, we used all 14 samples for…

Continue Reading Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

Bioinformatics Pipeline Development Engineer II at Personalis, Inc

Personalis, Inc. is a leader in advanced cancer genomics for enabling the next generation of precision cancer therapies and diagnostics. The Personalis NeXT Platform® is designed to adapt to the complex and evolving understanding of cancer, providing its biopharmaceutical customers and clinicians with information on all of the approximately 20,000 human genes,…

Continue Reading Bioinformatics Pipeline Development Engineer II at Personalis, Inc

how to extract unique variants from GVCF

how to extract unique variants from GVCF 1 [note: cross-posted on GATK forum – still awaiting a response] I have a GVCF (generated using GATK’s HaplotypeCaller w/ -ERC GVCF parameter) of 36 related samples and would like to determine the (potentially de novo) variants that are unique to each sample….

Continue Reading how to extract unique variants from GVCF

Variant quality and filters on GATK HaplotypeCaller generated VCFs

Variant quality and filters on GATK HaplotypeCaller generated VCFs 0 Hi, I am analysing human WGS data to diagnose rare inherited diseases. I followed the GATK Best Practices Guidelines for “Germline short variants discovery” for single-sample data to generate a VCF using HaplotypeCaller. The guidelines then point to the use…

Continue Reading Variant quality and filters on GATK HaplotypeCaller generated VCFs

rna seq – RNAseq SNP discovery: deciding upon filters and dealing with allele expression bias

I am working with non-model plant RNA samples which we have been deep sequenced and analysed using STAR aligner under default parameters. Aim We would like to conduct SNP discovery of these samples. Objective Our ultimate goal with this genotypic data is to search for variants (both SNPs and indels)…

Continue Reading rna seq – RNAseq SNP discovery: deciding upon filters and dealing with allele expression bias

Color hiring Software Engineer, Bioinformatics in Remote

About Color Color’s mission is to help people lead the healthiest lives that science and medicine can offer. We launched in April 2015 with a simple, affordable genetic test to help people understand their risk for hereditary cancer. In 2017, we added coverage for hereditary heart conditions. Between them, cancer…

Continue Reading Color hiring Software Engineer, Bioinformatics in Remote

vcf – Why does GATK produce both 0/1 and 1/0 genotypes in the same file? Are the two not equivalent?

I have always thought that 1/0 and 0/1 in VCF genotype fields are equivalent. And yet, GATK uses both. For example, these are two variants called in the same sample and the same run of GATK 4.1.4.0: chr7 117120317 . ATTCATTGTTTTGAAAGAAAGATGGAAGAATGAACTGAAG A 748.97 . AC=1;AF=0.5;AN=2;DP=64;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;QD=11.89;SOR=7.223 GT:AD:DP:GQ:PL:SB 1/0:0,36:63:99:2294,1042,933:0,0,0,36 chr7 117120306 ….

Continue Reading vcf – Why does GATK produce both 0/1 and 1/0 genotypes in the same file? Are the two not equivalent?

BTG2 gene predicts poor outcome in PT-DLBCL

Introduction Primary testicular diffuse large B-cell lymphoma (PT-DLBCL) is a rare and aggressive form of mature B-cell lymphoma.1–3 PT-DLBCL was the most common type of testicular tumor in men aged over 60 and characterized by painless uni- or bilateral testicular masses with infrequent constitutional symptoms.4–6 PT-DLBCL shows significant extranodal tropism,…

Continue Reading BTG2 gene predicts poor outcome in PT-DLBCL

HRJOB7442 Bioinformatics Scientist 2 (Various Locations) in Nether Alderley, Macclesfield (SK10) | Almac Group (Uk) Ltd

Bioinformatics Scientist 2 Hours: 37.5 hours per week Salary: Competitive Ref No: HRJOB7442 Business Unit: Diagnostic Services Location: Craigavon or Manchester Open To: Internal and External Applicants The Company Almac Diagnostic Services is a leading stratified medicine business, specialising in biomarker-driven clinical trials. We are incredibly proud to be involved…

Continue Reading HRJOB7442 Bioinformatics Scientist 2 (Various Locations) in Nether Alderley, Macclesfield (SK10) | Almac Group (Uk) Ltd

java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

I can’t seem to get GATK to recognise the number of available threads. I am running GATK (4.2.4.1) in a conda environment which is part of a nextflow (v20.10.0) pipeline I’m writing. For whatever reason, I cannot get GATK to see there is more than one thread. I’ve tried different…

Continue Reading java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

GATK HaplotypeCaller with interval list

I am trying to use the -L option of GATK HaplotypeCaller to call SNPs and short InDels with in an interval list. My interval list file (top8snp.interval_list) content is as follows: 12 33029845 33030845 + rs24767598 13 40586682 40587682 + rs24748362 18 24373857 24374857 + rs8856159 21 50381146 50382146 +…

Continue Reading GATK HaplotypeCaller with interval list

bcftools merged vcf file assigns all variants to one sample

bcftools merged vcf file assigns all variants to one sample 0 I’ve made one vcf file for each of three samples. I then combined them using bcftools, like so: # Make a list of vcf files to merge cat “${OUT}/results/variants/vcf_list” /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/3a7a-10.vcf.gz /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/MF3.vcf.gz /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/R507H-FB_S355_L001.vcf.gz Then merge the list: bcftools merge -l…

Continue Reading bcftools merged vcf file assigns all variants to one sample

Senior Bioinformatics Software Developer – Bethesda

Medical Science & Computing, (MSC), a Dovel company, is seeking skilled Senior Bioinformatics Software Developers to join our team supporting our client, NCBI at the National Institutes of Health, (NIH) in Bethesda, MD. The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM) at…

Continue Reading Senior Bioinformatics Software Developer – Bethesda

variant – Error running gatk HaplotypeCaller with allele specific annotations

I’ve got HaplotypeCaller working nicely in standard mode, like so: # Run haplotypcaller gatk –java-options “-Xmx4g” HaplotypeCaller –intervals “$INTERVALS” -R “$REF” -I “$OUT”/results/alignment/${SN}_sorted_marked_recalibrated.bam -O “$OUT”/results/variants/${SN}_g.vcf.gz -ERC GVCF But when I try in allele-specific mode, I get the following error. All I’ve done is add the -G annotations at the end,…

Continue Reading variant – Error running gatk HaplotypeCaller with allele specific annotations

Variant calls of published already assembled genomes

Variant calls of published already assembled genomes 0 I have a set of short read sequencing for the 172 KB Epstein-barr virus genome. We successfully called our variants using GATK to a reference genome. A publication linked below from a different population compared variants (also from short read sequencing) to…

Continue Reading Variant calls of published already assembled genomes

Do VQSR for HaplotypeCaller calls – Sarek

Expected Behavior Filter the calls from HaplotypeCaller with Variant Quality Score Recalibration according to GATK best practise (Tools VariantRecalibrator, ApplyRecalibration, see gatkforums.broadinstitute.org/gatk/discussion/39/variant-quality-score-recalibration-vqsr or a more recent version) Current Behavior Variant quality score recalibration currently not included. Asked Jan 26 ’18 at 08:25 malinlarsson 1 Answer: Keep in mind, that you’d…

Continue Reading Do VQSR for HaplotypeCaller calls – Sarek

sequence alignment – MarkDuplicatesSpark failing with cryptic error message. MarkDuplicates succeeds

[*] I have been trying to follow the GATK Best Practice Workflow for ‘Data pre-processing for variant discovery’ (gatk.broadinstitute.org/hc/en-us/articles/360035535912). This has all been run on Windows Subsystem for Linux 2 on the Bash shell. I started off with FASTQ files from IGSR (www.internationalgenome.org/data-portal) and performed alignment with Bowtie2 (instead of…

Continue Reading sequence alignment – MarkDuplicatesSpark failing with cryptic error message. MarkDuplicates succeeds

PathSeqFilterSpark

PathSeqFilterSpark 0 I have been trying to filter out low-quality bases on my task to conduct a variant annotation, meanwhile, I have completed all previous steps required. However, when I try to filter out low-quality bases after BQSR (GATK), the PathSeqFilterSpark did not yield a output file. There was no…

Continue Reading PathSeqFilterSpark

GATK GenotypeGVCFs changes HET to REF_ALT

Dear all, I’ve been using GATK HaplotypeCaller / GenotypGVFs (v4.2.3.0) for a while but, recently found something strange. There is a position (7063) with 8 reads (3T + 5A) that, even though HaplotyCaller calls as a HET (see image, lower track): NC_046966.1 7063 . T A,<NON_REF> 177.64 . BaseQRankSum=0.887;DP=8;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=2.369;RAW_MQandDP=16885,8;ReadPosRankSum=1.345 GT:AD:DP:GQ:PL:SB…

Continue Reading GATK GenotypeGVCFs changes HET to REF_ALT

Systems biology analysis of human genomes points to key pathways conferring spina bifida risk

Significance Genetic investigations of most structural birth defects, including spina bifida (SB), congenital heart disease, and craniofacial anomalies, have been underpowered for genome-wide association studies because of their rarity, genetic heterogeneity, incomplete penetrance, and environmental influences. Our systems biology strategy to investigate SB predisposition controls for population stratification and avoids…

Continue Reading Systems biology analysis of human genomes points to key pathways conferring spina bifida risk

Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

This blog post was contributed by Ankit Sethia, PhD, and Timothy Harkins, PhD, at NVIDIA Parabricks, and Olivia Choudhury, PhD,  Sujaya Srinivasan, and Aniket Deshpande at AWS. This blog provides an overview of NVIDIA’s Clara Parabricks along with a guide on how to use Parabricks within the AWS Marketplace. It…

Continue Reading Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

Dragen-gatk for trio

Dragen-gatk for trio 0 Hi everyone, the Dragen gatk pipeline works great for single sample. however I would like to know if any have used this pipeline for a trio? if so how did you do it? it is recommended to do a hard filtering based on QUAL but how…

Continue Reading Dragen-gatk for trio

Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

I’ve got sequencing data for a small 500 bp amplicon from a few samples. GATK best principles suggest running VariantRecalibrator on the GVCF files I generate. I’m trying to get this working, but I get an error about “Found annotations with zero variances”. Reading the gatk manual and other posts…

Continue Reading Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

Sr Scientist – IVD Development – Houston

NuProbe USA Inc . is looking for a Staff/Senior Scientist to lead the IVD project development program at NuProbe to support both research and in vitro diagnostic (IVD) assays for use in medical research, clinical trials, regulatory submissions, and clinical diagnostic use.  NuProbe USA is a rapidly growing company and…

Continue Reading Sr Scientist – IVD Development – Houston

Large-scale genome-wide study reveals climate adaptive variability in a cosmopolitan pest

Genomic data The foundational resource for this study was a dataset of 40,107,925 nuclear SNPs sequenced from a worldwide sample of 532 DBM individuals collected in 114 different sites based on our previous project15. DNA was extracted from each of the 532 individuals using DNeasy Blood and Tissue Kit (Qiagen,…

Continue Reading Large-scale genome-wide study reveals climate adaptive variability in a cosmopolitan pest

Genome Bioinformatics Analyst – Pittsburgh

**Description** UPMC Presbyterian is hiring a Genome Bioinformatics Analyst to join the Molecular and Genomic Pathology Laboratory (MGP) team! This role will work a daylight schedule Monday through Friday. No weekends or holidays are required! The Molecular and Genomic Pathology Laboratory (MGP) is a dynamic state-of-the-art clinical laboratory that prides…

Continue Reading Genome Bioinformatics Analyst – Pittsburgh

how to add reference alleles to VCF?

how to add reference alleles to VCF? 1 I’m converting gVCFs to VCF, but the reference alleles are missing. An example below: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 180525_FD02929177 1 97547947 . T . . . DP=31 GT:DP:RGQ 0/0:31:81 1 97915614 . C . . . DP=40…

Continue Reading how to add reference alleles to VCF?

gatk VariantRecalibrator positional argument error

I’m trying to use recalibrate my vcf using gatk VariantRecalibrator, but keep getting an error “Illegal argument value: Positional arguments were provided”. But I don’t know what this means, or how to correct it! Here’s my call: gatk VariantRecalibrator -R “/Volumes/Seagate Expansion Drive/refs/hg38/gatk download/Homo_sapiens_assembly38.fasta” -V “$OUT”/results/variants/”$SN”.norm.vcf.gz -AS –resource hapmap,known=false,training=true,truth=true,prior=15.0: “/Volumes/Seagate…

Continue Reading gatk VariantRecalibrator positional argument error

Why invariant blocks in GATK consistently have very low quality scores (but not variant sites)

I am using the latest GATK 4.1.2.0 to do variant calling on insect samples with a reference genome of a closely related species. The heterozygosity is approximately 0.02. I followed the standard pipeline of “HaplotypeCaller –> GenomicDBImport –> GenotypeGVCFs” to get my unfiltered VCFs, however, although my variant sites have…

Continue Reading Why invariant blocks in GATK consistently have very low quality scores (but not variant sites)

No quality in non-variant sites GATK

No quality in non-variant sites GATK 1 Heys, I am doing the SNP calling with Haplotypecaller BP_Resolution, CombineGVCFs with convert-to-base-pair-resolution and GenotypeGVCFs with include-non-variant-sites with GATK and when I get my vcf file, the non-variant sites does not have any quality at all: #CHROM POS ID REF ALT QUAL FILTER…

Continue Reading No quality in non-variant sites GATK

What is the single nucleotide polymorphism database ( dbsnp )?

The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Furthermore, are there any databases for single nucleotide polymorphisms?As there…

Continue Reading What is the single nucleotide polymorphism database ( dbsnp )?

Parallel genomic responses to historical climate change and high elevation in East Asian songbirds

Extreme environments present profound physiological stress. The adaptation of closely related species to these environments is likely to invoke congruent genetic responses resulting in similar physiological and/or morphological adaptations, a process termed “parallel evolution” (1). Existing evidence shows that parallel evolution is more common at the phenotypic level than at…

Continue Reading Parallel genomic responses to historical climate change and high elevation in East Asian songbirds

VCF samtools

VCF samtools 0 Hello, I am having trouble when doing variant calling with samtools. I am getting only the header an no variants. If I would instead use Freebayes, I do get a lot of variables, and with Gatk, I get just a few. What can the problem be? Do…

Continue Reading VCF samtools

Somatic Variant Calling

Somatic Variant Calling 2 Hi, I need to call somatic variants from a BAM file of cancer panel. Can anyone please suggest any suitable tool for calling the variants and generate a VCF file. Thank You BAM NGS Variants Cancer • 53 views “Suitable” is very context-dependent, are you working…

Continue Reading Somatic Variant Calling

Making consensus sequence for each haplotype

Making consensus sequence for each haplotype 0 I’m dealing with paired end amplicon sequencing data. I’ve produced a GVCF file with haplotype calls using: gatk HaplotypeCaller -R $REF -I “$BAM” -O “$OUT”/results/variants/${SN}_HaplotypeCallerPGT.vcf -ERC GVCF The vcf file it produces contains the PGT flag, and variants are called in the format…

Continue Reading Making consensus sequence for each haplotype

state and usuge of compressed file standards better than BAM and FASTQ

Forum:2021: state and usuge of compressed file standards better than BAM and FASTQ 3 Extra compressed formats for raw/aligned reads and variant tables have been around for some time but I think saw slow adoption. Our current disk space usage is making us have another look at switching to file…

Continue Reading state and usuge of compressed file standards better than BAM and FASTQ

how to do basic statistics for bam files

how to do basic statistics for bam files 1 Hi Bistar teams, I have unpaired Exom-seq data. I did the quality control and alignment. Now my files are in bam format and I would like to do some basic statistics like fragment size, coverages, mismatches, Gaps , duplicates etc ….

Continue Reading how to do basic statistics for bam files

Gatk pipeline wdl on multiples sample input

Gatk pipeline wdl on multiples sample input 0 Hello all, i would like to use gatk pipelines with cromwell on our hpc, however for germline single sample pipeline i wanted to know if there is a way to run it directly on multiple samples? I can’t see myself writing more…

Continue Reading Gatk pipeline wdl on multiples sample input

Best way to merge multiple VCF files

Best way to merge multiple VCF files 3 Hi, I am trying to merge a bunch of vcf files into one vcf of known SNPs. The files are separated by chromosome. I am trying to figure out how to merge all the files but in a way that the chromosome…

Continue Reading Best way to merge multiple VCF files

Using IndexFeatureFile to index vcf

Using IndexFeatureFile to index vcf 0 I have a number of vcfs that I need to index (create .idx files), which I attempted to do with GATK, e.g. java -Xmx5g -jar $gatk_path/GenomeAnalysisTK.jar -T IndexFeatureFile -F N03_INDELS.vcf Which returns an invalid argument error even though the vcf file is, as far…

Continue Reading Using IndexFeatureFile to index vcf

Add or reveal read groups on .sam file aligned by BWA

Add or reveal read groups on .sam file aligned by BWA 0 Hi, I’m trying to use GATK HaplotypeCaller but everytime I run its says A USER ERROR has occurred: Argument emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the –sample-name argument to…

Continue Reading Add or reveal read groups on .sam file aligned by BWA

how to visually compare BAM file differences

how to visually compare BAM file differences 0 I am a Bioinformatics novice learning workflow of calling somatic mutation . I found actions related to BAM file are these : sort, markdup ,reorder ,indel realignment,BQSR , I want to known the differences of them after I execute one step ….

Continue Reading how to visually compare BAM file differences

best practice to design and reuse a process/worfklow

Let’s say I want to genotype a set of BAMs using GATK. A basic DSL2 nextflow workflow would look like: workflow { take: reference beds bams main: hc = haplotypecaller(reference,bams.combine(beds)) bed2vcf = combinegvcf(hc.groupTuple()) vcf = gathervcfs(bed2vcf.collect()) } process haplotypecaller { input: val(reference) tuple val(bam),val(bed) output: tuple bed,path(“sample.g.vcf.gz”) script: “”” gatk…

Continue Reading best practice to design and reuse a process/worfklow

What is GenotypeGVCFs?

Hello! This article gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels- says i should use HaplotypeCaller in GVCF mode and GenotypeGVCFs then, and this article gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels- advises to use HaplotypeCaller without GenotypeGVCFs. I tried the former (with one sample), and the result is similar to the result of HaplotypeCaller in non-GVCF mode, however it differs in some…

Continue Reading What is GenotypeGVCFs?

Consensus sequence for phased variant calls

Consensus sequence for phased variant calls 0 I’ve got paired end sequencing data from a ~500 bp amplicon. I’ve aligned the data and called variants using gatk to phase the variants, as follows. The phasing information is now under the PGT tag. gatk HaplotypeCaller -R $REF -I “$BAM” -O “$DIR”/variants/${SN}_HaplotypeCallerPGT.vcf…

Continue Reading Consensus sequence for phased variant calls

Bioinformatics Engineer Job Opening in St. Louis, MO at Benson Hill

About Benson HillBenson Hill empowers innovators to develop more healthy, tasty and sustainable food by unlocking the natural genetic diversity of plants. Benson Hill’s CropOS™ platform combines machine learning and big data with advanced breeding techniques and plant biology to drastically accelerate and simplify the product development process. The CropOS…

Continue Reading Bioinformatics Engineer Job Opening in St. Louis, MO at Benson Hill

Confusion regarding manual inclusion of read group information from fastq files

I have recently received a collection of paired-end fastq files (WES) from our collaborators. I am following the GATK best practices workflow. I have completed the alignment, sorting&indexing step and generated a list of bam files. However, upon further inspection, I found out that the bam files do not have…

Continue Reading Confusion regarding manual inclusion of read group information from fastq files

The mtDNA mutation spectrum in the PolG mutator mouse reveals germline and somatic selection | BMC Genomic Data

1. Taylor RW, Turnbull DM. Mitochondrial DNA mutations in human disease [Internet]. Vol. 6, Nature Reviews Genetics. Europe PMC Funders; 2005 [cited 2020 Aug 21]. p. 389–402. Available from: /pmc/articles/PMC1762815/?report=abstract. 2. Kabunga P, Lau AK, Phan K, Puranik R, Liang C, Davis RL, Sue CM, Sy RW Systematic review of…

Continue Reading The mtDNA mutation spectrum in the PolG mutator mouse reveals germline and somatic selection | BMC Genomic Data

LeftAlignIndels error

LeftAlignIndels error 0 Hello! I input sorted and indexing bam file to LeftAlignIndels: ~/Soft/gatk-4.1.9.0/gatk LeftAlignIndels -I bam_fin/Exome_dups.bam -R /mnt/lapd/Index_hum/dna2/GRCh_2021.fa -O bam_fin/Exome.bam And have this error: ‘java.lang.IllegalArgumentException: Alignments added out of order in SAMFileWriterImpl.addAlignment for file:///mnt/lapd/Vika_data/RNF_raw/exome/bam_fin/Exome.bam. Sort order is coordinate. Offending records are at [1:152985370] and [1:152985347] at htsjdk.samtools.SAMFileWriterImpl.assertPresorted(SAMFileWriterImpl.java:197) at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:184)…

Continue Reading LeftAlignIndels error

Bioinformatics Scientist at Infectious Disease Institute

IDI seeks to hire a Bioinformatics Scientist (BS) for the centre. The BS will be a fulltime staff who is familiar with the application of computational and biotechnology capabilities to biomedical and public health problems like genetics, clinical and medical research, as well as other data intensive analyses. By coordinating…

Continue Reading Bioinformatics Scientist at Infectious Disease Institute

GATK CNV Caller – issue with PostprocessGermlineCNVCalls

Hi there, I’m running gatk PostprocessGermlineCNVCalls for a cohort of hg19 aligned WES samples. I’m following the suggested pipeline here: gatk.broadinstitute.org/hc/en-us/articles/360035531152–How-to-Call-common-and-rare-germline-copy-number-variants#5 However, in the final step when I try to process CNV calls for an individual sample, I get the following error, asking me to run FilterIntervals, but I’ve already…

Continue Reading GATK CNV Caller – issue with PostprocessGermlineCNVCalls

how to install picard and GATK

how to install picard and GATK 1 Hi all I want to install the current version of the picard and GATK software using Ubuntu terminal on windows PC for some genomics analysis. I have the current version java but could not able to install picard. Well I download the entire…

Continue Reading how to install picard and GATK

Polyploidy found, and not supported by vcftools for a diploid data set.

Polyploidy found, and not supported by vcftools for a diploid data set. 0 Hi, I used gatk mutect2-select variant (retained only SNPs)-combinegvcfs to generate a vcf file for a diploid species. When I tried to process the vcf file using vcf tools, some of the commands did work, however, when…

Continue Reading Polyploidy found, and not supported by vcftools for a diploid data set.

Manager, Bioinformatics Verification and Validation

Personalis is a rapidly growing cancer genomics company transforming the development of next-generation therapies by providing more comprehensive molecular data about each patient’s cancer and immune response. Our ImmunoID NeXT Platform is enabling the development of next generation immuno-oncology therapeutics and diagnostics. Summary: You will join a team of bioinformaticians…

Continue Reading Manager, Bioinformatics Verification and Validation

Phasing using Beagle with a map file

I’d like to phase the SNPs in a vcf file and output consensus files for each haplotype, as suggested in this post: www.biostars.org/p/298635/ I’ve managed to install beagle in a conda environment: conda create -n beagle -c conda-forge -c bioconda beagle conda activate beagle When I run beagle using this…

Continue Reading Phasing using Beagle with a map file

Post filtering analysis for exome data

Post filtering analysis for exome data 0 Hello I am following GATK pipeline to process exome data set. I am done with preprocessing step and filtered the dataset by hard filtering method. Now, I am looking for variants shared between the affected individuals. In the vcf file, I get the…

Continue Reading Post filtering analysis for exome data

heterozygous SNV AB>0.15, heterozygous indel

heterozygous SNV AB>0.15, heterozygous indel<0.20 in UKB-WES 0 These gVCFs were joint genotyped using GLnexus (www.biorxiv.org/content/10.1101/572347v1) to create a single, unfiltered project-level VCF (pVCF). Genotype depth filters (SNV DP≥7, indel DP≥10) were applied prior to variant site filters requiring at least one variant genotype passing an allele balance filter (heterozygous…

Continue Reading heterozygous SNV AB>0.15, heterozygous indel

Dictionary cannot have size zero

GATK RealignerTargetCreator: IllegalArgumentException: Dictionary cannot have size zero 0 I am new to variant calling and trying to create realignment targets using GATK but keep getting this error, despite having a dictionary file: java.lang.IllegalArgumentException: Dictionary cannot have size zero at org.broadinstitute.gatk.utils.MRUCachingSAMSequenceDictionary.<init>(MRUCachingSAMSequenceDictionary.java:62) at org.broadinstitute.gatk.utils.GenomeLocParser$1.initialValue(GenomeLocParser.java:78) at org.broadinstitute.gatk.utils.GenomeLocParser$1.initialValue(GenomeLocParser.java:75) at java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:180) at java.lang.ThreadLocal.get(ThreadLocal.java:170) at…

Continue Reading Dictionary cannot have size zero

GATK4 -known-sites input Recalibrate Base Quality Scores

GATK4 -known-sites input Recalibrate Base Quality Scores 0 I am working with a non-reference plant species that I want to call variants from after aligning to a closely related species reference genome. I am following the GATK4 best practices pipeline, but I would like to know how I should proceed…

Continue Reading GATK4 -known-sites input Recalibrate Base Quality Scores

Interploidy gene flow involving the sexual-asexual cycle facilitates the diversification of gynogenetic triploid Carassius fish

1. Muller, H. J. The relation of recombination to mutational advance. Mutat. Res. Mol. Mech. Mutagen. 1, 2–9 (1964). Google Scholar  2. Maynard Smith, J. The Evolution of Sex (Cambridge University Press, 1978). Google Scholar  3. Avise, J. C. Clonality (Oxford University Press, 2008). Google Scholar  4. Hamilton, W. D.,…

Continue Reading Interploidy gene flow involving the sexual-asexual cycle facilitates the diversification of gynogenetic triploid Carassius fish

Personalis Senior Bioinformatics Pipeline Development Engineer

Senior Bioinformatics Pipeline Development Engineer (Remote option available) at Personalis, Inc (View all jobs) Menlo Park Personalis is a rapidly growing cancer genomics company transforming the development of next-generation therapies by providing more comprehensive molecular data about each patient’s cancer and immune response. Our ImmunoID NeXT Platform® is enabling the…

Continue Reading Personalis Senior Bioinformatics Pipeline Development Engineer

identification of ROH using plink

identification of ROH using plink 0 Hello All I generated vcf file using GATK (First Haplotypecaller –> CombinedGVCF –> GenotypeGVCF and then Hard filtering ). After this, I converted filtered vcf file into plink binary PED files (.bed, .fem, .bim, plink v1.9) using –make-bed command. However, when I used these…

Continue Reading identification of ROH using plink

Shared variants

Shared variants 1 Hello I have exome data sets from 6 individuals, in which 4 are affected and 2 are not affected. I have to identify the variants which are shared between the four affected individuals. I did the joint call genotyping for the 4 affected individuals and filtered the…

Continue Reading Shared variants

GATK HaplotypeCaller works without GVCF option, but errors with GVCF

I’ve extracted chromosome 4 from a whole genome bam file as follows: samtools view -h “$BAM” chr4 > “$EXT/temp/”$PREFIX”_chr4.sam” samtools view -bS “$EXT”/temp/$PREFIX”_chr4.sam” > “$EXT”/temp/$PREFIX”_chr4.bam” Then added read groups, as required by GATK picard AddOrReplaceReadGroups I=”$BAM” O=”$EXT”/temp/$PREFIX”_chr4_rg.bam” RGID=4 RGLB=lib1 RGPL=ILLUMINA RGPU=unit1 RGSM=20 Index the bam: samtools index “$BAM” Download the…

Continue Reading GATK HaplotypeCaller works without GVCF option, but errors with GVCF

How to generate the contigs ploidy priors table (yeast) for GATK DetermineGermlineContigPloidy –contig-ploidy-priors option ?

How to generate the contigs ploidy priors table (yeast) for GATK DetermineGermlineContigPloidy –contig-ploidy-priors option ? 1 Hi ! I was asked to determine the ploidy level and to do CNV calling on a yeast sample (Reference sequence : S. cerevisiae S288C). In order to perform CNV calling with the GATK…

Continue Reading How to generate the contigs ploidy priors table (yeast) for GATK DetermineGermlineContigPloidy –contig-ploidy-priors option ?

PGT only available for some variants in GATK .vcf

PGT only available for some variants in GATK .vcf 1 I’ve got a vcf file someone else prepared using GATK. I’m interested in the phasing information in the PGT tag e.g. 0|1. This information seems to be available for some variants, but not for others e.g. below chr1 16977 ….

Continue Reading PGT only available for some variants in GATK .vcf

Senior Bioinformatics Pipeline Development Engineer at Personalis

Senior Bioinformatics Pipeline Development Engineer (Remote option available) at Personalis, Inc (View all jobs) Menlo Park Personalis is a rapidly growing cancer genomics company transforming the development of next-generation therapies by providing more comprehensive molecular data about each patient’s cancer and immune response. Our ImmunoID NeXT Platform® is enabling the…

Continue Reading Senior Bioinformatics Pipeline Development Engineer at Personalis

GATK Mutect2 errors during basic variant calling

GATK Mutect2 errors during basic variant calling 0 I’ve just installed GATK and am trying to do some basic variant calling. However when I try and run this line gatk Mutect2 -R $REF -I “$BAM” -O “$DIR”/gatk/$PREFIX”_bwa_gatk_unfiltered.vcf” I get the error below. Reading the output, it looks like this is…

Continue Reading GATK Mutect2 errors during basic variant calling

Windows-Bases Software Packages Which Can Analyze Vcf Files

Windows-Bases Software Packages Which Can Analyze Vcf Files 6 I would like to work with VCF files. Select one person, subset one gene or chromosome or chromosome part. I tried VMware and Ubuntu and VCFtools and GATK and tabix but I run into a lot of errors. I don’t have…

Continue Reading Windows-Bases Software Packages Which Can Analyze Vcf Files

Phasing with Beagle 5.2 and no reference panel

Phasing with Beagle 5.2 and no reference panel 0 Hi everyone, I have a question about phasing with Beagle 5.2 without a reference panel. I have seen in answers in a couple other posts about Beagle that trying to phase with too few samples and no reference panel is not…

Continue Reading Phasing with Beagle 5.2 and no reference panel

False negatives -Hard filtering

False negatives -Hard filtering 0 Hello I need some suggestions in filtering the variants in the exome data. I combined all the GVCF files as one file and did joint call genotyping and created one vcf file. The variants in the file were hard-filtered. As first step to evaluate the…

Continue Reading False negatives -Hard filtering

Best practice for running GATK VQSR on X chromosome

Best practice for running GATK VQSR on X chromosome 0 According to GATK best practice, it is recommended that different VQSR models be built for SNPs and INDELs, because the annotations for high-quality SNPs and INDELs are systematically different (if I understand it correctly). Since annotations for good variants on…

Continue Reading Best practice for running GATK VQSR on X chromosome

Malformed walker argument using MarkDuplicatesSpark

Malformed walker argument using MarkDuplicatesSpark 1 I am creating my own NGS pipeline from illumina-fastq file to vcf. This is for pure learning purposes. When I run the following code everything is ok java -Xmx4000m “$javatmp” -jar “$picardpath” SortSam INPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/sam/1.sam OUTPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/bam/1_sorted.bam SORT_ORDER=coordinate COMPRESSION_LEVEL=5 java -Xmx4000m “$javatmp” -jar “$picardpath” MarkDuplicates INPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/bam/1_sorted.bam…

Continue Reading Malformed walker argument using MarkDuplicatesSpark

In the NGS pipeline, why read are sorted before marking duplicates?

In the NGS pipeline, why read are sorted before marking duplicates? 0 I am creating my own NGS pipeline (from Illumina fastq to vcf file). I am using best practices GATK and the pipeline already created in the clinical lab I am working. I have seen that the fastq is…

Continue Reading In the NGS pipeline, why read are sorted before marking duplicates?

Janis Germline Variant-Calling Workflow (GATK)

This is a genomics pipeline to do a single germline sample variant-calling, adapted from GATK Best Practice Workflow. This workflow is a reference pipeline for using the Janis Python framework (pipelines assistant). Alignment: bwa-mem Variant-Calling: GATK HaplotypeCaller Outputs the final variants in the VCF format. Resources This pipeline has been…

Continue Reading Janis Germline Variant-Calling Workflow (GATK)

Should we trust genotypes called in simple tandem repeat regions?

Should we trust genotypes called in simple tandem repeat regions? 1 Hello. I am searching genomes (WGS) or exomes (WES) of patients with rare diseases for potential disease-causing variants. The accuracy of each genotype for each patient is vital. I’m using GATK 4 to perform joint-calling of genotypes of the…

Continue Reading Should we trust genotypes called in simple tandem repeat regions?

filter Refseq file with bed? to produce coverage per gene

filter Refseq file with bed? to produce coverage per gene 0 Hi I am using GATK DepthOfCoverage -genelist mode to find the coverage per gene. The output contains many 0 coverage for genes that are not in the panel. Can I fix it by removing the genes that are not…

Continue Reading filter Refseq file with bed? to produce coverage per gene

DepthOfCoverage Error: no suitable codec found

Hi, I am trying to find the coverage per sample and per gene using GATK DepthOfCoverage. I have downloaded the refseq file as per gatk suggested. But it gave me this error: A USER ERROR has occurred: Cannot read because no suitable codecs found gatk DepthOfCoverage -R $ref -O…

Continue Reading DepthOfCoverage Error: no suitable codec found

Snakemake pipeline with Gatk GermlineCNVCaller in Case mode

Snakemake pipeline with Gatk GermlineCNVCaller in Case mode 1 Hi everyone, I am trying to set up a Snakemake pipeline for germline CNV calling with Gatk in CASE mode since I have a background ready to use . My background is fragmented in 20 shards (something like ../cohort-twenty/name_1of20-model), and the…

Continue Reading Snakemake pipeline with Gatk GermlineCNVCaller in Case mode

The provided VCF file is malformed

htsjdk.tribble.TribbleException: The provided VCF file is malformed 1 I have VCF files that I want to convert to a more readable TSV file using GATK VariantsToTable, and I also want to load in the VCF in IGV. However, when trying to do this, I get the same error for both…

Continue Reading The provided VCF file is malformed

How To Split Multiple Samples In Vcf File Generated By Gatk?

There now also is a plugin in bcftools which does the split in a single pass over the multi-sample VCF/BCF file. It does not seem to be very fast, but looks correct and there are options to do the split in custom ways. You do need to install bcftools with…

Continue Reading How To Split Multiple Samples In Vcf File Generated By Gatk?

Picard vs Samtools converting CRAM to FASTQ

Picard vs Samtools converting CRAM to FASTQ 0 I need to convert my CRAM files to FASTQ to complete an analysis. I have been trying to do this via GATK and Picard, but I have repeatedly been getting an “out of memory” error even as I have increased allocated memory…

Continue Reading Picard vs Samtools converting CRAM to FASTQ

ApplyBQSR won’t recognise output argument

I’m trying to recalibrate some bams using the following: gatk –java-options “-Xmx8g” ApplyBQSR -I $insampleID.sorted.dups.bam -R $reference –bqsr-recal-file $outsampleID.table -O $outsampleID.recal.bam Every time I try, I get the following error: A USER ERROR has occurred: Argument output was missing: Argument ‘output’ is required. The following, within the same script, was…

Continue Reading ApplyBQSR won’t recognise output argument

Genome Engineering Research Scientist

Genome Engineering Research Scientist – 94152 Organization: JG-Joint Genome Institute Lawrence Berkeley National Lab’s (LBNL, www.lbl.gov/) Environmental Genomics and Systems Biology Division (biosciences.lbl.gov/divisions/egsb/) has an opening for a Genome Engineering Research Scientist to join the team. In this exciting role, you will work as part of the Center for Advanced…

Continue Reading Genome Engineering Research Scientist

HaplotypeCaller vs DeepVariant. How to interpret the quality scores?

Variant quality scores with different variant caller: HaplotypeCaller vs DeepVariant. How to interpret the quality scores? 0 Hi, I am trying to compare variant calling outputs of GATK’s HaplotypeCaller and DeepVariant. Their raw output is very different; for example, in a WGS sample, DeepVariant called 947386 variants located on chr1,…

Continue Reading HaplotypeCaller vs DeepVariant. How to interpret the quality scores?

sciClone input vaf file?

sciClone input vaf file? 3 Dear All, Hi, I want to use sciclone on our exome sequencing data. but one thing I can’t understand that is how can I got varCount equal to 0? I have no idea about this, following data i just grep from sciclone-meta-master manuscript figure3 data…

Continue Reading sciClone input vaf file?

Color hiring Bioinformatics Scientist in Chicago, Illinois, United States

Named by Rock Health as the Best Digital Health Company to Work For , Color is a leading healthcare technology company. Color is building and delivering technology-enabled healthcare to millions of people. Through partnerships with public and private partners including governments, employers and health systems, Color’s infrastructure and software enables…

Continue Reading Color hiring Bioinformatics Scientist in Chicago, Illinois, United States

Color hiring Bioinformatics Engineer in Atlanta, Georgia, United States

Named by Rock Health as the Best Digital Health Company to Work For , Color is a leading healthcare technology company. Color is building and delivering technology-enabled healthcare to millions of people. Through partnerships with public and private partners including governments, employers and health systems, Color’s infrastructure and software enables…

Continue Reading Color hiring Bioinformatics Engineer in Atlanta, Georgia, United States

HaplotypeCaller calling mutations based on one read?

HaplotypeCaller calling mutations based on one read? 0 I’m using GATK HaplotypeCaller, via grenepipe, with the default options as specified by grenepipe except for -ploidy 1 as I am working with haploid yeast. I am seeing some mutations called based on one single read only if I am interpreting the…

Continue Reading HaplotypeCaller calling mutations based on one read?

GATK-Allele frequency

GATK-Allele frequency 0 Hi Guys, I am running GATK on bam file for variant calling. In the output file, I noticed that the Allele frequency is computed as 0.5 and 1.00. What may be the reason for this? Is it calculated correctly? VCF Allele GATK frequency • 27 views Login…

Continue Reading GATK-Allele frequency

Calculate allele frequency from many VCF files in specific locus

Calculate allele frequency from many VCF files in specific locus 1 Dear all, I have 100 VCF files (100 different samples). I would like to calculate allele frequency in specific sites. In one specific locus I have three genotypes (GATK best practices workflow): rs-xxxxx: A/A occurring in 30 samples (ref…

Continue Reading Calculate allele frequency from many VCF files in specific locus

Error with GenomeAnalysisTK.jar finding tools

I’m trying to run GATK on my machine as part of a pipeline using Phyluce. As per the instructions here (phyluce.readthedocs.io/en/v1.6.8/installation.html#why-conda), I downloaded GATK 3.7-0, activated a conda environment, and imported the GATK package into conda using the following code: *conda activate phyluce-1.7.1 *gatk-register /PATH/TO/GATK-3.7/JAR/GenomeAnalysisTK.jar** Terminal recognizes the command ‘gatk…

Continue Reading Error with GenomeAnalysisTK.jar finding tools

Soft-clipping read ends based on read group

Soft-clipping read ends based on read group 2 Is it possible to clip (soft-clip preferably) n (for example, 3) nucleotides from both ends of reads in a bam file, but only for the reads with a certain defined read group? I have merged bams for ancient DNA samples and the…

Continue Reading Soft-clipping read ends based on read group

Providence hiring Bioinformatics Scientist 1 in Portland, Oregon, United States

DescriptionProvidence is calling a Bioinformatics Scientist 1 to the Molecular Genomics Lab at Providence Office Park i n Portland, OR. This is a full-time (1.0 FTE), day shift position. This position is a hybrid role between working in the lab and working from home.Apply today! Applicants that meet qualifications will…

Continue Reading Providence hiring Bioinformatics Scientist 1 in Portland, Oregon, United States

Jobot hiring Senior Bioinformatics Scientist in Boston, Massachusetts, United States

This Jobot Job is hosted by Emily Olinger Are you a fit? Easy Apply now by clicking the “Apply” button and sending us your resume. Salary $100,000 – $150,000 per year A Bit About Us We are a leading cloud based SaaS startup in the biotech industry. Our platforms leads…

Continue Reading Jobot hiring Senior Bioinformatics Scientist in Boston, Massachusetts, United States

Is it ok to replace missing WGS calls with reference notation “0/0”?

Is it ok to replace missing WGS calls with reference notation “0/0”? 1 I called variants on 200 WGS samples, each got around 4 mil variants, however, most were unique and only 1 mil variants overlapped between most individuals. I suppose it is normal behaviour that GATK won’t output info…

Continue Reading Is it ok to replace missing WGS calls with reference notation “0/0”?