Tag: GATK

GATK vs DeepVariant : bioinformatics

Hi everyone, I am currently working on a medium-sized WES cohort study and wanted to know what the bioinformatics community would regard a cutting-edge workflow. As the big labs usually utilize GATK I also started with that. The results for SNPs are ok, but manual inspection (IGV) still uncovers a…

Continue Reading GATK vs DeepVariant : bioinformatics

Job – Principal Biostistician/Bioinformatics job at Kenya Medical Research

Vacancy title: Principal Biostistician/Bioinformatics [ Type: FULL TIME , Industry: Research , Category: Research ] Jobs at: Kenya Medical Research – KEMRI Deadline of this Job: 06 October 2022   Duty Station: Within Kenya , Kisumu , East Africa SummaryDate Posted: Tuesday, September 20, 2022 , Base Salary: Not Disclosed…

Continue Reading Job – Principal Biostistician/Bioinformatics job at Kenya Medical Research

Bioinformatics Scientist in Pittsburgh, PA

Description Purpose:The scientist works independently using a robust math toolbox to discover solutions for a diverse portfolio of interesting and challenging problems. The scientist develops, implements, and monitors advanced analytic, medical informatics, and predictive modeling tools for health care programs at the UPMC. The scientist normally works Monday through Friday…

Continue Reading Bioinformatics Scientist in Pittsburgh, PA

Joint variant calling on DeepVariant GVCFs using GATK GenotypeGVCFs

Joint variant calling on DeepVariant GVCFs using GATK GenotypeGVCFs 0 Hi everyone I have a bunch of GVCF files generated by DeepVariant, but I want to use GATK’s GenotypeGVCFs for joint variant calling on them (I don’t want to use GLnexus). But GATK requires a genotype likelihood field produced by…

Continue Reading Joint variant calling on DeepVariant GVCFs using GATK GenotypeGVCFs

Principal Software Engineer – Bioinformatics Job in Mississauga, Ontario – Hoffmann-La Roche Ltd

Impact HealthcareRoche Sequencing is not only changing science, but we are changing lives. Our software teams are laying the groundwork for the future by developing powerful bioinformatics algorithms, data analysis tools, and software/systems infrastructures so researchers and clinicians can make better health decisions faster. The path to curing cancer lies…

Continue Reading Principal Software Engineer – Bioinformatics Job in Mississauga, Ontario – Hoffmann-La Roche Ltd

iPSCs derived from infertile men carrying complex genetic abnormalities can generate primordial germ-like cells

Patients and controls The patient 1 was 38 years old and consulted for infertility after he and his partner had been trying to conceive for 2 years. The patient was the first child of unrelated parents, and he had four brothers and five sisters whose fertility status could not be determined…

Continue Reading iPSCs derived from infertile men carrying complex genetic abnormalities can generate primordial germ-like cells

using gatk haplotypecaller for variants extraction

using gatk haplotypecaller for variants extraction 0 Hi, I have rna-sequenced data from covid patients. I am using hisat2 for aligning the reads to reference. So, the resulted bam files after indexing are now ready. I would like to use gatk happlotypecaller for extracting variants from my bam files. First,…

Continue Reading using gatk haplotypecaller for variants extraction

Bioinformatics Analyst in Minneapolis, MN for University of Minnesota, Twin Cities

Details Posted: 13-Aug-22 Location: Minneapolis, Minnesota Salary: 43944.32 – 123022.07 Categories: Research Support – Laboratory/Non-Laboratory Staff/Administrative Additional Information: 2 openings available. The Research Informatics Solutions (RIS) group within the University of Minnesota Supercomputing Institute (MSI) is hiring two full-time Bioinformatics Analysts to support research at the University of Minnesota. Analysts…

Continue Reading Bioinformatics Analyst in Minneapolis, MN for University of Minnesota, Twin Cities

Genomic architecture of adaptive radiation and hybridization in Alpine whitefish

Sampling the radiation To understand the phylogenetic relationships between Alpine whitefish, we carried out whole-genome resequencing on 96 previously collected whitefish (with associated phenotypic measurements including standard length and gill-raker counts; collected in accordance with permits issued by the cantons of Zurich (ZH128/15), Bern (BE68/15), and Lucerne (LU04/14); these fish…

Continue Reading Genomic architecture of adaptive radiation and hybridization in Alpine whitefish

Standalone GATK HaplotypeCaller : bioinformatics

Hello! I’m hoping someone can direct me to resources around acquiring or building standalone gatk tools, specifically HaplotypeCaller. All of my research has led to the monolithic gatk wrapper (either local, spark, or in docker). The big tool is brilliant and I’ve been using it thus far, but it’s pretty…

Continue Reading Standalone GATK HaplotypeCaller : bioinformatics

08 compare visualization results of different annotation software

stay In the first two sections , We compared the differences vcf Use of annotation software , And convert the demerit recorded after the annotation into maf File format , because snpeff The comment result cannot be converted to maf, So we will compare later ANNOVAR、VEP、GATK Funcatator The results of…

Continue Reading 08 compare visualization results of different annotation software

Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant

Subjects Normal breast and tumor samples were obtained with the written informed consent from donors and appropriate approval from local ethical committees, with the detailed information described in the respective original publications: normal tissue9, METABRIC14, TCGA35. Differential allelic expression analysis DNA and total RNA from 64 samples of normal breast…

Continue Reading Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant

Seven Bridges, Brazilian Researchers Applying Graph Analysis to Build Diverse Reference Genome

CHICAGO – The latest of Seven Bridges Genomics’ efforts to diversify reference genomes is its largest and perhaps most complex to date, an attempt to address the Brazilian population. The Charlestown, Massachusetts-based bioinformatics company recently joined with the University of São Paulo (USP), the Associação Genomas Brasil (Brazil Genome Association),…

Continue Reading Seven Bridges, Brazilian Researchers Applying Graph Analysis to Build Diverse Reference Genome

Extract R1 and R2 from sam file generated by bowtie2

Extract R1 and R2 from sam file generated by bowtie2 1 Hi every one How to extract R1 and R2 from sam file generated by bowtie2 ? sam bowtie2 samtools bam • 137 views • link updated 14 hours ago by iraun ★ 4.4k • written 15 hours ago by…

Continue Reading Extract R1 and R2 from sam file generated by bowtie2

Detailed differences between sambamba and samtools

3 month , My first post in the new student group , The false-positive mutation appears because duplicates mark Not enough ?, Tells the story of supplementary read It won’t be GATK MarkDuplicates Marked as duplicates The problem of . after , In response to this question , I began…

Continue Reading Detailed differences between sambamba and samtools

Parse a file of strings in python separated by newline into a json array

I don’t see where you’re actually reading from the file in the first place. You have to actually read your path_text.txt before you can format it correctly right? with open(‘path_text.txt’,’r’,encoding=’utf-8′) as myfile: content = myfiel.read().splitlines() Which will give you [‘/gp/oi/eu/gatk/inputs/NA12878_24RG_med.hg38.bam’, ‘/gp/oi/eu/gatk/inputs/NA12878_24RG_small.hg38.bam’] in content. Now if you want to write this…

Continue Reading Parse a file of strings in python separated by newline into a json array

Hard filtering on GATK HaplotypeCaller giving multiple warnings

I’m using this pipeline for deriving variants from RNA sequencing data: github.com/modupeore/VAP which uses specific versions of various tools, including HaplotypeCaller from GATK (v3.8-0-ge9d806836). The final step is a set of hard filters on the called variants (applied using VariantFilter), but looking at the log files, there are a lot…

Continue Reading Hard filtering on GATK HaplotypeCaller giving multiple warnings

Bioinformatics Analyst II – Remote in Danville, PA for Geisinger

Details Posted: 22-Apr-22 Location: Danville, Pennsylvania Type: Full Time Salary: Open Categories: Operations Job Summary Primary accountability is to leverage the organization’s data assets exome sequencing data (>180,000 individuals) from MyCode Community Health Initiative to improve quality, efficiency and generate knowledge specifically in the field of bioinformatics within health research….

Continue Reading Bioinformatics Analyst II – Remote in Danville, PA for Geisinger

Bioinformatics Scientist for Whole Genome and Whole Exome Sequencing

** Bioinformatics Scientist for Whole Genome and Whole Exome Sequencing ** The NeuroGenomics and Informatics (NGI) Center lead by Dr. Carlos Cruchaga at Washington University School of Medicine is recruiting a Bioinformatics Scientist to work on Whole Genome and Whole Exome Sequencing. We are seeking an experienced, self-motivated, self-driven scientist…

Continue Reading Bioinformatics Scientist for Whole Genome and Whole Exome Sequencing

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

Sequencing data We used publicly available sequencing data from the GIAB consortium45, 1000 Genomes Project high-coverage data46 and Human Genome Structural Variation Consortium (HGSVC)4. All datasets include only samples consented for public dissemination of the full genomes. Statistics and reproducibility For generating the assemblies, we used all 14 samples for…

Continue Reading Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

Bioinformatics Pipeline Development Engineer II at Personalis, Inc

Personalis, Inc. is a leader in advanced cancer genomics for enabling the next generation of precision cancer therapies and diagnostics. The Personalis NeXT Platform® is designed to adapt to the complex and evolving understanding of cancer, providing its biopharmaceutical customers and clinicians with information on all of the approximately 20,000 human genes,…

Continue Reading Bioinformatics Pipeline Development Engineer II at Personalis, Inc

how to extract unique variants from GVCF

how to extract unique variants from GVCF 1 [note: cross-posted on GATK forum – still awaiting a response] I have a GVCF (generated using GATK’s HaplotypeCaller w/ -ERC GVCF parameter) of 36 related samples and would like to determine the (potentially de novo) variants that are unique to each sample….

Continue Reading how to extract unique variants from GVCF

Variant quality and filters on GATK HaplotypeCaller generated VCFs

Variant quality and filters on GATK HaplotypeCaller generated VCFs 0 Hi, I am analysing human WGS data to diagnose rare inherited diseases. I followed the GATK Best Practices Guidelines for “Germline short variants discovery” for single-sample data to generate a VCF using HaplotypeCaller. The guidelines then point to the use…

Continue Reading Variant quality and filters on GATK HaplotypeCaller generated VCFs

rna seq – RNAseq SNP discovery: deciding upon filters and dealing with allele expression bias

I am working with non-model plant RNA samples which we have been deep sequenced and analysed using STAR aligner under default parameters. Aim We would like to conduct SNP discovery of these samples. Objective Our ultimate goal with this genotypic data is to search for variants (both SNPs and indels)…

Continue Reading rna seq – RNAseq SNP discovery: deciding upon filters and dealing with allele expression bias

Color hiring Software Engineer, Bioinformatics in Remote

About Color Color’s mission is to help people lead the healthiest lives that science and medicine can offer. We launched in April 2015 with a simple, affordable genetic test to help people understand their risk for hereditary cancer. In 2017, we added coverage for hereditary heart conditions. Between them, cancer…

Continue Reading Color hiring Software Engineer, Bioinformatics in Remote

vcf – Why does GATK produce both 0/1 and 1/0 genotypes in the same file? Are the two not equivalent?

I have always thought that 1/0 and 0/1 in VCF genotype fields are equivalent. And yet, GATK uses both. For example, these are two variants called in the same sample and the same run of GATK 4.1.4.0: chr7 117120317 . ATTCATTGTTTTGAAAGAAAGATGGAAGAATGAACTGAAG A 748.97 . AC=1;AF=0.5;AN=2;DP=64;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;QD=11.89;SOR=7.223 GT:AD:DP:GQ:PL:SB 1/0:0,36:63:99:2294,1042,933:0,0,0,36 chr7 117120306 ….

Continue Reading vcf – Why does GATK produce both 0/1 and 1/0 genotypes in the same file? Are the two not equivalent?

BTG2 gene predicts poor outcome in PT-DLBCL

Introduction Primary testicular diffuse large B-cell lymphoma (PT-DLBCL) is a rare and aggressive form of mature B-cell lymphoma.1–3 PT-DLBCL was the most common type of testicular tumor in men aged over 60 and characterized by painless uni- or bilateral testicular masses with infrequent constitutional symptoms.4–6 PT-DLBCL shows significant extranodal tropism,…

Continue Reading BTG2 gene predicts poor outcome in PT-DLBCL

HRJOB7442 Bioinformatics Scientist 2 (Various Locations) in Nether Alderley, Macclesfield (SK10) | Almac Group (Uk) Ltd

Bioinformatics Scientist 2 Hours: 37.5 hours per week Salary: Competitive Ref No: HRJOB7442 Business Unit: Diagnostic Services Location: Craigavon or Manchester Open To: Internal and External Applicants The Company Almac Diagnostic Services is a leading stratified medicine business, specialising in biomarker-driven clinical trials. We are incredibly proud to be involved…

Continue Reading HRJOB7442 Bioinformatics Scientist 2 (Various Locations) in Nether Alderley, Macclesfield (SK10) | Almac Group (Uk) Ltd

java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

I can’t seem to get GATK to recognise the number of available threads. I am running GATK (4.2.4.1) in a conda environment which is part of a nextflow (v20.10.0) pipeline I’m writing. For whatever reason, I cannot get GATK to see there is more than one thread. I’ve tried different…

Continue Reading java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

GATK HaplotypeCaller with interval list

I am trying to use the -L option of GATK HaplotypeCaller to call SNPs and short InDels with in an interval list. My interval list file (top8snp.interval_list) content is as follows: 12 33029845 33030845 + rs24767598 13 40586682 40587682 + rs24748362 18 24373857 24374857 + rs8856159 21 50381146 50382146 +…

Continue Reading GATK HaplotypeCaller with interval list

bcftools merged vcf file assigns all variants to one sample

bcftools merged vcf file assigns all variants to one sample 0 I’ve made one vcf file for each of three samples. I then combined them using bcftools, like so: # Make a list of vcf files to merge cat “${OUT}/results/variants/vcf_list” /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/3a7a-10.vcf.gz /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/MF3.vcf.gz /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/R507H-FB_S355_L001.vcf.gz Then merge the list: bcftools merge -l…

Continue Reading bcftools merged vcf file assigns all variants to one sample

Senior Bioinformatics Software Developer – Bethesda

Medical Science & Computing, (MSC), a Dovel company, is seeking skilled Senior Bioinformatics Software Developers to join our team supporting our client, NCBI at the National Institutes of Health, (NIH) in Bethesda, MD. The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM) at…

Continue Reading Senior Bioinformatics Software Developer – Bethesda

variant – Error running gatk HaplotypeCaller with allele specific annotations

I’ve got HaplotypeCaller working nicely in standard mode, like so: # Run haplotypcaller gatk –java-options “-Xmx4g” HaplotypeCaller –intervals “$INTERVALS” -R “$REF” -I “$OUT”/results/alignment/${SN}_sorted_marked_recalibrated.bam -O “$OUT”/results/variants/${SN}_g.vcf.gz -ERC GVCF But when I try in allele-specific mode, I get the following error. All I’ve done is add the -G annotations at the end,…

Continue Reading variant – Error running gatk HaplotypeCaller with allele specific annotations

Variant calls of published already assembled genomes

Variant calls of published already assembled genomes 0 I have a set of short read sequencing for the 172 KB Epstein-barr virus genome. We successfully called our variants using GATK to a reference genome. A publication linked below from a different population compared variants (also from short read sequencing) to…

Continue Reading Variant calls of published already assembled genomes

Do VQSR for HaplotypeCaller calls – Sarek

Expected Behavior Filter the calls from HaplotypeCaller with Variant Quality Score Recalibration according to GATK best practise (Tools VariantRecalibrator, ApplyRecalibration, see gatkforums.broadinstitute.org/gatk/discussion/39/variant-quality-score-recalibration-vqsr or a more recent version) Current Behavior Variant quality score recalibration currently not included. Asked Jan 26 ’18 at 08:25 malinlarsson 1 Answer: Keep in mind, that you’d…

Continue Reading Do VQSR for HaplotypeCaller calls – Sarek

sequence alignment – MarkDuplicatesSpark failing with cryptic error message. MarkDuplicates succeeds

[*] I have been trying to follow the GATK Best Practice Workflow for ‘Data pre-processing for variant discovery’ (gatk.broadinstitute.org/hc/en-us/articles/360035535912). This has all been run on Windows Subsystem for Linux 2 on the Bash shell. I started off with FASTQ files from IGSR (www.internationalgenome.org/data-portal) and performed alignment with Bowtie2 (instead of…

Continue Reading sequence alignment – MarkDuplicatesSpark failing with cryptic error message. MarkDuplicates succeeds

PathSeqFilterSpark

PathSeqFilterSpark 0 I have been trying to filter out low-quality bases on my task to conduct a variant annotation, meanwhile, I have completed all previous steps required. However, when I try to filter out low-quality bases after BQSR (GATK), the PathSeqFilterSpark did not yield a output file. There was no…

Continue Reading PathSeqFilterSpark

GATK GenotypeGVCFs changes HET to REF_ALT

Dear all, I’ve been using GATK HaplotypeCaller / GenotypGVFs (v4.2.3.0) for a while but, recently found something strange. There is a position (7063) with 8 reads (3T + 5A) that, even though HaplotyCaller calls as a HET (see image, lower track): NC_046966.1 7063 . T A,<NON_REF> 177.64 . BaseQRankSum=0.887;DP=8;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=2.369;RAW_MQandDP=16885,8;ReadPosRankSum=1.345 GT:AD:DP:GQ:PL:SB…

Continue Reading GATK GenotypeGVCFs changes HET to REF_ALT

Systems biology analysis of human genomes points to key pathways conferring spina bifida risk

Significance Genetic investigations of most structural birth defects, including spina bifida (SB), congenital heart disease, and craniofacial anomalies, have been underpowered for genome-wide association studies because of their rarity, genetic heterogeneity, incomplete penetrance, and environmental influences. Our systems biology strategy to investigate SB predisposition controls for population stratification and avoids…

Continue Reading Systems biology analysis of human genomes points to key pathways conferring spina bifida risk

Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

This blog post was contributed by Ankit Sethia, PhD, and Timothy Harkins, PhD, at NVIDIA Parabricks, and Olivia Choudhury, PhD,  Sujaya Srinivasan, and Aniket Deshpande at AWS. This blog provides an overview of NVIDIA’s Clara Parabricks along with a guide on how to use Parabricks within the AWS Marketplace. It…

Continue Reading Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

Dragen-gatk for trio

Dragen-gatk for trio 0 Hi everyone, the Dragen gatk pipeline works great for single sample. however I would like to know if any have used this pipeline for a trio? if so how did you do it? it is recommended to do a hard filtering based on QUAL but how…

Continue Reading Dragen-gatk for trio

Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

I’ve got sequencing data for a small 500 bp amplicon from a few samples. GATK best principles suggest running VariantRecalibrator on the GVCF files I generate. I’m trying to get this working, but I get an error about “Found annotations with zero variances”. Reading the gatk manual and other posts…

Continue Reading Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

Sr Scientist – IVD Development – Houston

NuProbe USA Inc . is looking for a Staff/Senior Scientist to lead the IVD project development program at NuProbe to support both research and in vitro diagnostic (IVD) assays for use in medical research, clinical trials, regulatory submissions, and clinical diagnostic use.  NuProbe USA is a rapidly growing company and…

Continue Reading Sr Scientist – IVD Development – Houston

Large-scale genome-wide study reveals climate adaptive variability in a cosmopolitan pest

Genomic data The foundational resource for this study was a dataset of 40,107,925 nuclear SNPs sequenced from a worldwide sample of 532 DBM individuals collected in 114 different sites based on our previous project15. DNA was extracted from each of the 532 individuals using DNeasy Blood and Tissue Kit (Qiagen,…

Continue Reading Large-scale genome-wide study reveals climate adaptive variability in a cosmopolitan pest

Genome Bioinformatics Analyst – Pittsburgh

**Description** UPMC Presbyterian is hiring a Genome Bioinformatics Analyst to join the Molecular and Genomic Pathology Laboratory (MGP) team! This role will work a daylight schedule Monday through Friday. No weekends or holidays are required! The Molecular and Genomic Pathology Laboratory (MGP) is a dynamic state-of-the-art clinical laboratory that prides…

Continue Reading Genome Bioinformatics Analyst – Pittsburgh

how to add reference alleles to VCF?

how to add reference alleles to VCF? 1 I’m converting gVCFs to VCF, but the reference alleles are missing. An example below: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 180525_FD02929177 1 97547947 . T . . . DP=31 GT:DP:RGQ 0/0:31:81 1 97915614 . C . . . DP=40…

Continue Reading how to add reference alleles to VCF?

gatk VariantRecalibrator positional argument error

I’m trying to use recalibrate my vcf using gatk VariantRecalibrator, but keep getting an error “Illegal argument value: Positional arguments were provided”. But I don’t know what this means, or how to correct it! Here’s my call: gatk VariantRecalibrator -R “/Volumes/Seagate Expansion Drive/refs/hg38/gatk download/Homo_sapiens_assembly38.fasta” -V “$OUT”/results/variants/”$SN”.norm.vcf.gz -AS –resource hapmap,known=false,training=true,truth=true,prior=15.0: “/Volumes/Seagate…

Continue Reading gatk VariantRecalibrator positional argument error

Why invariant blocks in GATK consistently have very low quality scores (but not variant sites)

I am using the latest GATK 4.1.2.0 to do variant calling on insect samples with a reference genome of a closely related species. The heterozygosity is approximately 0.02. I followed the standard pipeline of “HaplotypeCaller –> GenomicDBImport –> GenotypeGVCFs” to get my unfiltered VCFs, however, although my variant sites have…

Continue Reading Why invariant blocks in GATK consistently have very low quality scores (but not variant sites)

No quality in non-variant sites GATK

No quality in non-variant sites GATK 1 Heys, I am doing the SNP calling with Haplotypecaller BP_Resolution, CombineGVCFs with convert-to-base-pair-resolution and GenotypeGVCFs with include-non-variant-sites with GATK and when I get my vcf file, the non-variant sites does not have any quality at all: #CHROM POS ID REF ALT QUAL FILTER…

Continue Reading No quality in non-variant sites GATK

What is the single nucleotide polymorphism database ( dbsnp )?

The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Furthermore, are there any databases for single nucleotide polymorphisms?As there…

Continue Reading What is the single nucleotide polymorphism database ( dbsnp )?

Parallel genomic responses to historical climate change and high elevation in East Asian songbirds

Extreme environments present profound physiological stress. The adaptation of closely related species to these environments is likely to invoke congruent genetic responses resulting in similar physiological and/or morphological adaptations, a process termed “parallel evolution” (1). Existing evidence shows that parallel evolution is more common at the phenotypic level than at…

Continue Reading Parallel genomic responses to historical climate change and high elevation in East Asian songbirds

VCF samtools

VCF samtools 0 Hello, I am having trouble when doing variant calling with samtools. I am getting only the header an no variants. If I would instead use Freebayes, I do get a lot of variables, and with Gatk, I get just a few. What can the problem be? Do…

Continue Reading VCF samtools

Somatic Variant Calling

Somatic Variant Calling 2 Hi, I need to call somatic variants from a BAM file of cancer panel. Can anyone please suggest any suitable tool for calling the variants and generate a VCF file. Thank You BAM NGS Variants Cancer • 53 views “Suitable” is very context-dependent, are you working…

Continue Reading Somatic Variant Calling

Making consensus sequence for each haplotype

Making consensus sequence for each haplotype 0 I’m dealing with paired end amplicon sequencing data. I’ve produced a GVCF file with haplotype calls using: gatk HaplotypeCaller -R $REF -I “$BAM” -O “$OUT”/results/variants/${SN}_HaplotypeCallerPGT.vcf -ERC GVCF The vcf file it produces contains the PGT flag, and variants are called in the format…

Continue Reading Making consensus sequence for each haplotype

state and usuge of compressed file standards better than BAM and FASTQ

Forum:2021: state and usuge of compressed file standards better than BAM and FASTQ 3 Extra compressed formats for raw/aligned reads and variant tables have been around for some time but I think saw slow adoption. Our current disk space usage is making us have another look at switching to file…

Continue Reading state and usuge of compressed file standards better than BAM and FASTQ

how to do basic statistics for bam files

how to do basic statistics for bam files 1 Hi Bistar teams, I have unpaired Exom-seq data. I did the quality control and alignment. Now my files are in bam format and I would like to do some basic statistics like fragment size, coverages, mismatches, Gaps , duplicates etc ….

Continue Reading how to do basic statistics for bam files

Gatk pipeline wdl on multiples sample input

Gatk pipeline wdl on multiples sample input 0 Hello all, i would like to use gatk pipelines with cromwell on our hpc, however for germline single sample pipeline i wanted to know if there is a way to run it directly on multiple samples? I can’t see myself writing more…

Continue Reading Gatk pipeline wdl on multiples sample input

Best way to merge multiple VCF files

Best way to merge multiple VCF files 3 Hi, I am trying to merge a bunch of vcf files into one vcf of known SNPs. The files are separated by chromosome. I am trying to figure out how to merge all the files but in a way that the chromosome…

Continue Reading Best way to merge multiple VCF files

Using IndexFeatureFile to index vcf

Using IndexFeatureFile to index vcf 0 I have a number of vcfs that I need to index (create .idx files), which I attempted to do with GATK, e.g. java -Xmx5g -jar $gatk_path/GenomeAnalysisTK.jar -T IndexFeatureFile -F N03_INDELS.vcf Which returns an invalid argument error even though the vcf file is, as far…

Continue Reading Using IndexFeatureFile to index vcf

Add or reveal read groups on .sam file aligned by BWA

Add or reveal read groups on .sam file aligned by BWA 0 Hi, I’m trying to use GATK HaplotypeCaller but everytime I run its says A USER ERROR has occurred: Argument emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the –sample-name argument to…

Continue Reading Add or reveal read groups on .sam file aligned by BWA

how to visually compare BAM file differences

how to visually compare BAM file differences 0 I am a Bioinformatics novice learning workflow of calling somatic mutation . I found actions related to BAM file are these : sort, markdup ,reorder ,indel realignment,BQSR , I want to known the differences of them after I execute one step ….

Continue Reading how to visually compare BAM file differences

best practice to design and reuse a process/worfklow

Let’s say I want to genotype a set of BAMs using GATK. A basic DSL2 nextflow workflow would look like: workflow { take: reference beds bams main: hc = haplotypecaller(reference,bams.combine(beds)) bed2vcf = combinegvcf(hc.groupTuple()) vcf = gathervcfs(bed2vcf.collect()) } process haplotypecaller { input: val(reference) tuple val(bam),val(bed) output: tuple bed,path(“sample.g.vcf.gz”) script: “”” gatk…

Continue Reading best practice to design and reuse a process/worfklow

What is GenotypeGVCFs?

Hello! This article gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels- says i should use HaplotypeCaller in GVCF mode and GenotypeGVCFs then, and this article gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels- advises to use HaplotypeCaller without GenotypeGVCFs. I tried the former (with one sample), and the result is similar to the result of HaplotypeCaller in non-GVCF mode, however it differs in some…

Continue Reading What is GenotypeGVCFs?

Consensus sequence for phased variant calls

Consensus sequence for phased variant calls 0 I’ve got paired end sequencing data from a ~500 bp amplicon. I’ve aligned the data and called variants using gatk to phase the variants, as follows. The phasing information is now under the PGT tag. gatk HaplotypeCaller -R $REF -I “$BAM” -O “$DIR”/variants/${SN}_HaplotypeCallerPGT.vcf…

Continue Reading Consensus sequence for phased variant calls

Bioinformatics Engineer Job Opening in St. Louis, MO at Benson Hill

About Benson HillBenson Hill empowers innovators to develop more healthy, tasty and sustainable food by unlocking the natural genetic diversity of plants. Benson Hill’s CropOS™ platform combines machine learning and big data with advanced breeding techniques and plant biology to drastically accelerate and simplify the product development process. The CropOS…

Continue Reading Bioinformatics Engineer Job Opening in St. Louis, MO at Benson Hill

Confusion regarding manual inclusion of read group information from fastq files

I have recently received a collection of paired-end fastq files (WES) from our collaborators. I am following the GATK best practices workflow. I have completed the alignment, sorting&indexing step and generated a list of bam files. However, upon further inspection, I found out that the bam files do not have…

Continue Reading Confusion regarding manual inclusion of read group information from fastq files

The mtDNA mutation spectrum in the PolG mutator mouse reveals germline and somatic selection | BMC Genomic Data

1. Taylor RW, Turnbull DM. Mitochondrial DNA mutations in human disease [Internet]. Vol. 6, Nature Reviews Genetics. Europe PMC Funders; 2005 [cited 2020 Aug 21]. p. 389–402. Available from: /pmc/articles/PMC1762815/?report=abstract. 2. Kabunga P, Lau AK, Phan K, Puranik R, Liang C, Davis RL, Sue CM, Sy RW Systematic review of…

Continue Reading The mtDNA mutation spectrum in the PolG mutator mouse reveals germline and somatic selection | BMC Genomic Data

LeftAlignIndels error

LeftAlignIndels error 0 Hello! I input sorted and indexing bam file to LeftAlignIndels: ~/Soft/gatk-4.1.9.0/gatk LeftAlignIndels -I bam_fin/Exome_dups.bam -R /mnt/lapd/Index_hum/dna2/GRCh_2021.fa -O bam_fin/Exome.bam And have this error: ‘java.lang.IllegalArgumentException: Alignments added out of order in SAMFileWriterImpl.addAlignment for file:///mnt/lapd/Vika_data/RNF_raw/exome/bam_fin/Exome.bam. Sort order is coordinate. Offending records are at [1:152985370] and [1:152985347] at htsjdk.samtools.SAMFileWriterImpl.assertPresorted(SAMFileWriterImpl.java:197) at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:184)…

Continue Reading LeftAlignIndels error

Bioinformatics Scientist at Infectious Disease Institute

IDI seeks to hire a Bioinformatics Scientist (BS) for the centre. The BS will be a fulltime staff who is familiar with the application of computational and biotechnology capabilities to biomedical and public health problems like genetics, clinical and medical research, as well as other data intensive analyses. By coordinating…

Continue Reading Bioinformatics Scientist at Infectious Disease Institute

GATK CNV Caller – issue with PostprocessGermlineCNVCalls

Hi there, I’m running gatk PostprocessGermlineCNVCalls for a cohort of hg19 aligned WES samples. I’m following the suggested pipeline here: gatk.broadinstitute.org/hc/en-us/articles/360035531152–How-to-Call-common-and-rare-germline-copy-number-variants#5 However, in the final step when I try to process CNV calls for an individual sample, I get the following error, asking me to run FilterIntervals, but I’ve already…

Continue Reading GATK CNV Caller – issue with PostprocessGermlineCNVCalls

how to install picard and GATK

how to install picard and GATK 1 Hi all I want to install the current version of the picard and GATK software using Ubuntu terminal on windows PC for some genomics analysis. I have the current version java but could not able to install picard. Well I download the entire…

Continue Reading how to install picard and GATK

Polyploidy found, and not supported by vcftools for a diploid data set.

Polyploidy found, and not supported by vcftools for a diploid data set. 0 Hi, I used gatk mutect2-select variant (retained only SNPs)-combinegvcfs to generate a vcf file for a diploid species. When I tried to process the vcf file using vcf tools, some of the commands did work, however, when…

Continue Reading Polyploidy found, and not supported by vcftools for a diploid data set.

Manager, Bioinformatics Verification and Validation

Personalis is a rapidly growing cancer genomics company transforming the development of next-generation therapies by providing more comprehensive molecular data about each patient’s cancer and immune response. Our ImmunoID NeXT Platform is enabling the development of next generation immuno-oncology therapeutics and diagnostics. Summary: You will join a team of bioinformaticians…

Continue Reading Manager, Bioinformatics Verification and Validation

Phasing using Beagle with a map file

I’d like to phase the SNPs in a vcf file and output consensus files for each haplotype, as suggested in this post: www.biostars.org/p/298635/ I’ve managed to install beagle in a conda environment: conda create -n beagle -c conda-forge -c bioconda beagle conda activate beagle When I run beagle using this…

Continue Reading Phasing using Beagle with a map file

Post filtering analysis for exome data

Post filtering analysis for exome data 0 Hello I am following GATK pipeline to process exome data set. I am done with preprocessing step and filtered the dataset by hard filtering method. Now, I am looking for variants shared between the affected individuals. In the vcf file, I get the…

Continue Reading Post filtering analysis for exome data

heterozygous SNV AB>0.15, heterozygous indel

heterozygous SNV AB>0.15, heterozygous indel<0.20 in UKB-WES 0 These gVCFs were joint genotyped using GLnexus (www.biorxiv.org/content/10.1101/572347v1) to create a single, unfiltered project-level VCF (pVCF). Genotype depth filters (SNV DP≥7, indel DP≥10) were applied prior to variant site filters requiring at least one variant genotype passing an allele balance filter (heterozygous…

Continue Reading heterozygous SNV AB>0.15, heterozygous indel

Dictionary cannot have size zero

GATK RealignerTargetCreator: IllegalArgumentException: Dictionary cannot have size zero 0 I am new to variant calling and trying to create realignment targets using GATK but keep getting this error, despite having a dictionary file: java.lang.IllegalArgumentException: Dictionary cannot have size zero at org.broadinstitute.gatk.utils.MRUCachingSAMSequenceDictionary.<init>(MRUCachingSAMSequenceDictionary.java:62) at org.broadinstitute.gatk.utils.GenomeLocParser$1.initialValue(GenomeLocParser.java:78) at org.broadinstitute.gatk.utils.GenomeLocParser$1.initialValue(GenomeLocParser.java:75) at java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:180) at java.lang.ThreadLocal.get(ThreadLocal.java:170) at…

Continue Reading Dictionary cannot have size zero

GATK4 -known-sites input Recalibrate Base Quality Scores

GATK4 -known-sites input Recalibrate Base Quality Scores 0 I am working with a non-reference plant species that I want to call variants from after aligning to a closely related species reference genome. I am following the GATK4 best practices pipeline, but I would like to know how I should proceed…

Continue Reading GATK4 -known-sites input Recalibrate Base Quality Scores

Interploidy gene flow involving the sexual-asexual cycle facilitates the diversification of gynogenetic triploid Carassius fish

1. Muller, H. J. The relation of recombination to mutational advance. Mutat. Res. Mol. Mech. Mutagen. 1, 2–9 (1964). Google Scholar  2. Maynard Smith, J. The Evolution of Sex (Cambridge University Press, 1978). Google Scholar  3. Avise, J. C. Clonality (Oxford University Press, 2008). Google Scholar  4. Hamilton, W. D.,…

Continue Reading Interploidy gene flow involving the sexual-asexual cycle facilitates the diversification of gynogenetic triploid Carassius fish

Personalis Senior Bioinformatics Pipeline Development Engineer

Senior Bioinformatics Pipeline Development Engineer (Remote option available) at Personalis, Inc (View all jobs) Menlo Park Personalis is a rapidly growing cancer genomics company transforming the development of next-generation therapies by providing more comprehensive molecular data about each patient’s cancer and immune response. Our ImmunoID NeXT Platform® is enabling the…

Continue Reading Personalis Senior Bioinformatics Pipeline Development Engineer

identification of ROH using plink

identification of ROH using plink 0 Hello All I generated vcf file using GATK (First Haplotypecaller –> CombinedGVCF –> GenotypeGVCF and then Hard filtering ). After this, I converted filtered vcf file into plink binary PED files (.bed, .fem, .bim, plink v1.9) using –make-bed command. However, when I used these…

Continue Reading identification of ROH using plink

Shared variants

Shared variants 1 Hello I have exome data sets from 6 individuals, in which 4 are affected and 2 are not affected. I have to identify the variants which are shared between the four affected individuals. I did the joint call genotyping for the 4 affected individuals and filtered the…

Continue Reading Shared variants

GATK HaplotypeCaller works without GVCF option, but errors with GVCF

I’ve extracted chromosome 4 from a whole genome bam file as follows: samtools view -h “$BAM” chr4 > “$EXT/temp/”$PREFIX”_chr4.sam” samtools view -bS “$EXT”/temp/$PREFIX”_chr4.sam” > “$EXT”/temp/$PREFIX”_chr4.bam” Then added read groups, as required by GATK picard AddOrReplaceReadGroups I=”$BAM” O=”$EXT”/temp/$PREFIX”_chr4_rg.bam” RGID=4 RGLB=lib1 RGPL=ILLUMINA RGPU=unit1 RGSM=20 Index the bam: samtools index “$BAM” Download the…

Continue Reading GATK HaplotypeCaller works without GVCF option, but errors with GVCF

How to generate the contigs ploidy priors table (yeast) for GATK DetermineGermlineContigPloidy –contig-ploidy-priors option ?

How to generate the contigs ploidy priors table (yeast) for GATK DetermineGermlineContigPloidy –contig-ploidy-priors option ? 1 Hi ! I was asked to determine the ploidy level and to do CNV calling on a yeast sample (Reference sequence : S. cerevisiae S288C). In order to perform CNV calling with the GATK…

Continue Reading How to generate the contigs ploidy priors table (yeast) for GATK DetermineGermlineContigPloidy –contig-ploidy-priors option ?

PGT only available for some variants in GATK .vcf

PGT only available for some variants in GATK .vcf 1 I’ve got a vcf file someone else prepared using GATK. I’m interested in the phasing information in the PGT tag e.g. 0|1. This information seems to be available for some variants, but not for others e.g. below chr1 16977 ….

Continue Reading PGT only available for some variants in GATK .vcf

Senior Bioinformatics Pipeline Development Engineer at Personalis

Senior Bioinformatics Pipeline Development Engineer (Remote option available) at Personalis, Inc (View all jobs) Menlo Park Personalis is a rapidly growing cancer genomics company transforming the development of next-generation therapies by providing more comprehensive molecular data about each patient’s cancer and immune response. Our ImmunoID NeXT Platform® is enabling the…

Continue Reading Senior Bioinformatics Pipeline Development Engineer at Personalis

GATK Mutect2 errors during basic variant calling

GATK Mutect2 errors during basic variant calling 0 I’ve just installed GATK and am trying to do some basic variant calling. However when I try and run this line gatk Mutect2 -R $REF -I “$BAM” -O “$DIR”/gatk/$PREFIX”_bwa_gatk_unfiltered.vcf” I get the error below. Reading the output, it looks like this is…

Continue Reading GATK Mutect2 errors during basic variant calling

Windows-Bases Software Packages Which Can Analyze Vcf Files

Windows-Bases Software Packages Which Can Analyze Vcf Files 6 I would like to work with VCF files. Select one person, subset one gene or chromosome or chromosome part. I tried VMware and Ubuntu and VCFtools and GATK and tabix but I run into a lot of errors. I don’t have…

Continue Reading Windows-Bases Software Packages Which Can Analyze Vcf Files

Phasing with Beagle 5.2 and no reference panel

Phasing with Beagle 5.2 and no reference panel 0 Hi everyone, I have a question about phasing with Beagle 5.2 without a reference panel. I have seen in answers in a couple other posts about Beagle that trying to phase with too few samples and no reference panel is not…

Continue Reading Phasing with Beagle 5.2 and no reference panel

False negatives -Hard filtering

False negatives -Hard filtering 0 Hello I need some suggestions in filtering the variants in the exome data. I combined all the GVCF files as one file and did joint call genotyping and created one vcf file. The variants in the file were hard-filtered. As first step to evaluate the…

Continue Reading False negatives -Hard filtering

Best practice for running GATK VQSR on X chromosome

Best practice for running GATK VQSR on X chromosome 0 According to GATK best practice, it is recommended that different VQSR models be built for SNPs and INDELs, because the annotations for high-quality SNPs and INDELs are systematically different (if I understand it correctly). Since annotations for good variants on…

Continue Reading Best practice for running GATK VQSR on X chromosome

Malformed walker argument using MarkDuplicatesSpark

Malformed walker argument using MarkDuplicatesSpark 1 I am creating my own NGS pipeline from illumina-fastq file to vcf. This is for pure learning purposes. When I run the following code everything is ok java -Xmx4000m “$javatmp” -jar “$picardpath” SortSam INPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/sam/1.sam OUTPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/bam/1_sorted.bam SORT_ORDER=coordinate COMPRESSION_LEVEL=5 java -Xmx4000m “$javatmp” -jar “$picardpath” MarkDuplicates INPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/bam/1_sorted.bam…

Continue Reading Malformed walker argument using MarkDuplicatesSpark

In the NGS pipeline, why read are sorted before marking duplicates?

In the NGS pipeline, why read are sorted before marking duplicates? 0 I am creating my own NGS pipeline (from Illumina fastq to vcf file). I am using best practices GATK and the pipeline already created in the clinical lab I am working. I have seen that the fastq is…

Continue Reading In the NGS pipeline, why read are sorted before marking duplicates?

Janis Germline Variant-Calling Workflow (GATK)

This is a genomics pipeline to do a single germline sample variant-calling, adapted from GATK Best Practice Workflow. This workflow is a reference pipeline for using the Janis Python framework (pipelines assistant). Alignment: bwa-mem Variant-Calling: GATK HaplotypeCaller Outputs the final variants in the VCF format. Resources This pipeline has been…

Continue Reading Janis Germline Variant-Calling Workflow (GATK)

Should we trust genotypes called in simple tandem repeat regions?

Should we trust genotypes called in simple tandem repeat regions? 1 Hello. I am searching genomes (WGS) or exomes (WES) of patients with rare diseases for potential disease-causing variants. The accuracy of each genotype for each patient is vital. I’m using GATK 4 to perform joint-calling of genotypes of the…

Continue Reading Should we trust genotypes called in simple tandem repeat regions?

filter Refseq file with bed? to produce coverage per gene

filter Refseq file with bed? to produce coverage per gene 0 Hi I am using GATK DepthOfCoverage -genelist mode to find the coverage per gene. The output contains many 0 coverage for genes that are not in the panel. Can I fix it by removing the genes that are not…

Continue Reading filter Refseq file with bed? to produce coverage per gene

DepthOfCoverage Error: no suitable codec found

Hi, I am trying to find the coverage per sample and per gene using GATK DepthOfCoverage. I have downloaded the refseq file as per gatk suggested. But it gave me this error: A USER ERROR has occurred: Cannot read because no suitable codecs found gatk DepthOfCoverage -R $ref -O…

Continue Reading DepthOfCoverage Error: no suitable codec found

Snakemake pipeline with Gatk GermlineCNVCaller in Case mode

Snakemake pipeline with Gatk GermlineCNVCaller in Case mode 1 Hi everyone, I am trying to set up a Snakemake pipeline for germline CNV calling with Gatk in CASE mode since I have a background ready to use . My background is fragmented in 20 shards (something like ../cohort-twenty/name_1of20-model), and the…

Continue Reading Snakemake pipeline with Gatk GermlineCNVCaller in Case mode

The provided VCF file is malformed

htsjdk.tribble.TribbleException: The provided VCF file is malformed 1 I have VCF files that I want to convert to a more readable TSV file using GATK VariantsToTable, and I also want to load in the VCF in IGV. However, when trying to do this, I get the same error for both…

Continue Reading The provided VCF file is malformed

How To Split Multiple Samples In Vcf File Generated By Gatk?

There now also is a plugin in bcftools which does the split in a single pass over the multi-sample VCF/BCF file. It does not seem to be very fast, but looks correct and there are options to do the split in custom ways. You do need to install bcftools with…

Continue Reading How To Split Multiple Samples In Vcf File Generated By Gatk?