Tag: GATK

Bioinformatics Engineer Job Opening in St. Louis, MO at Benson Hill

About Benson HillBenson Hill empowers innovators to develop more healthy, tasty and sustainable food by unlocking the natural genetic diversity of plants. Benson Hill’s CropOS™ platform combines machine learning and big data with advanced breeding techniques and plant biology to drastically accelerate and simplify the product development process. The CropOS…

Continue Reading Bioinformatics Engineer Job Opening in St. Louis, MO at Benson Hill

Confusion regarding manual inclusion of read group information from fastq files

I have recently received a collection of paired-end fastq files (WES) from our collaborators. I am following the GATK best practices workflow. I have completed the alignment, sorting&indexing step and generated a list of bam files. However, upon further inspection, I found out that the bam files do not have…

Continue Reading Confusion regarding manual inclusion of read group information from fastq files

Confusion regarding manual inclusion of read group information from fastq files

I have recently received a collection of paired-end fastq files (WES) from our collaborators. I am following the GATK best practices workflow. I have completed the alignment, sorting&indexing step and generated a list of bam files. However, upon further inspection, I found out that the bam files do not have…

Continue Reading Confusion regarding manual inclusion of read group information from fastq files

The mtDNA mutation spectrum in the PolG mutator mouse reveals germline and somatic selection | BMC Genomic Data

1. Taylor RW, Turnbull DM. Mitochondrial DNA mutations in human disease [Internet]. Vol. 6, Nature Reviews Genetics. Europe PMC Funders; 2005 [cited 2020 Aug 21]. p. 389–402. Available from: /pmc/articles/PMC1762815/?report=abstract. 2. Kabunga P, Lau AK, Phan K, Puranik R, Liang C, Davis RL, Sue CM, Sy RW Systematic review of…

Continue Reading The mtDNA mutation spectrum in the PolG mutator mouse reveals germline and somatic selection | BMC Genomic Data

LeftAlignIndels error

LeftAlignIndels error 0 Hello! I input sorted and indexing bam file to LeftAlignIndels: ~/Soft/gatk-4.1.9.0/gatk LeftAlignIndels -I bam_fin/Exome_dups.bam -R /mnt/lapd/Index_hum/dna2/GRCh_2021.fa -O bam_fin/Exome.bam And have this error: ‘java.lang.IllegalArgumentException: Alignments added out of order in SAMFileWriterImpl.addAlignment for file:///mnt/lapd/Vika_data/RNF_raw/exome/bam_fin/Exome.bam. Sort order is coordinate. Offending records are at [1:152985370] and [1:152985347] at htsjdk.samtools.SAMFileWriterImpl.assertPresorted(SAMFileWriterImpl.java:197) at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:184)…

Continue Reading LeftAlignIndels error

Bioinformatics Scientist at Infectious Disease Institute

IDI seeks to hire a Bioinformatics Scientist (BS) for the centre. The BS will be a fulltime staff who is familiar with the application of computational and biotechnology capabilities to biomedical and public health problems like genetics, clinical and medical research, as well as other data intensive analyses. By coordinating…

Continue Reading Bioinformatics Scientist at Infectious Disease Institute

GATK CNV Caller – issue with PostprocessGermlineCNVCalls

Hi there, I’m running gatk PostprocessGermlineCNVCalls for a cohort of hg19 aligned WES samples. I’m following the suggested pipeline here: gatk.broadinstitute.org/hc/en-us/articles/360035531152–How-to-Call-common-and-rare-germline-copy-number-variants#5 However, in the final step when I try to process CNV calls for an individual sample, I get the following error, asking me to run FilterIntervals, but I’ve already…

Continue Reading GATK CNV Caller – issue with PostprocessGermlineCNVCalls

how to install picard and GATK

how to install picard and GATK 1 Hi all I want to install the current version of the picard and GATK software using Ubuntu terminal on windows PC for some genomics analysis. I have the current version java but could not able to install picard. Well I download the entire…

Continue Reading how to install picard and GATK

how to install picard and GATK

how to install picard and GATK 1 Hi all I want to install the current version of the picard and GATK software using Ubuntu terminal on windows PC for some genomics analysis. I have the current version java but could not able to install picard. Well I download the entire…

Continue Reading how to install picard and GATK

Polyploidy found, and not supported by vcftools for a diploid data set.

Polyploidy found, and not supported by vcftools for a diploid data set. 0 Hi, I used gatk mutect2-select variant (retained only SNPs)-combinegvcfs to generate a vcf file for a diploid species. When I tried to process the vcf file using vcf tools, some of the commands did work, however, when…

Continue Reading Polyploidy found, and not supported by vcftools for a diploid data set.

Manager, Bioinformatics Verification and Validation

Personalis is a rapidly growing cancer genomics company transforming the development of next-generation therapies by providing more comprehensive molecular data about each patient’s cancer and immune response. Our ImmunoID NeXT Platform is enabling the development of next generation immuno-oncology therapeutics and diagnostics. Summary: You will join a team of bioinformaticians…

Continue Reading Manager, Bioinformatics Verification and Validation

Phasing using Beagle with a map file

I’d like to phase the SNPs in a vcf file and output consensus files for each haplotype, as suggested in this post: www.biostars.org/p/298635/ I’ve managed to install beagle in a conda environment: conda create -n beagle -c conda-forge -c bioconda beagle conda activate beagle When I run beagle using this…

Continue Reading Phasing using Beagle with a map file

Post filtering analysis for exome data

Post filtering analysis for exome data 0 Hello I am following GATK pipeline to process exome data set. I am done with preprocessing step and filtered the dataset by hard filtering method. Now, I am looking for variants shared between the affected individuals. In the vcf file, I get the…

Continue Reading Post filtering analysis for exome data

heterozygous SNV AB>0.15, heterozygous indel

heterozygous SNV AB>0.15, heterozygous indel<0.20 in UKB-WES 0 These gVCFs were joint genotyped using GLnexus (www.biorxiv.org/content/10.1101/572347v1) to create a single, unfiltered project-level VCF (pVCF). Genotype depth filters (SNV DP≥7, indel DP≥10) were applied prior to variant site filters requiring at least one variant genotype passing an allele balance filter (heterozygous…

Continue Reading heterozygous SNV AB>0.15, heterozygous indel

Dictionary cannot have size zero

GATK RealignerTargetCreator: IllegalArgumentException: Dictionary cannot have size zero 0 I am new to variant calling and trying to create realignment targets using GATK but keep getting this error, despite having a dictionary file: java.lang.IllegalArgumentException: Dictionary cannot have size zero at org.broadinstitute.gatk.utils.MRUCachingSAMSequenceDictionary.<init>(MRUCachingSAMSequenceDictionary.java:62) at org.broadinstitute.gatk.utils.GenomeLocParser$1.initialValue(GenomeLocParser.java:78) at org.broadinstitute.gatk.utils.GenomeLocParser$1.initialValue(GenomeLocParser.java:75) at java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:180) at java.lang.ThreadLocal.get(ThreadLocal.java:170) at…

Continue Reading Dictionary cannot have size zero

GATK4 -known-sites input Recalibrate Base Quality Scores

GATK4 -known-sites input Recalibrate Base Quality Scores 0 I am working with a non-reference plant species that I want to call variants from after aligning to a closely related species reference genome. I am following the GATK4 best practices pipeline, but I would like to know how I should proceed…

Continue Reading GATK4 -known-sites input Recalibrate Base Quality Scores

Interploidy gene flow involving the sexual-asexual cycle facilitates the diversification of gynogenetic triploid Carassius fish

1. Muller, H. J. The relation of recombination to mutational advance. Mutat. Res. Mol. Mech. Mutagen. 1, 2–9 (1964). Google Scholar  2. Maynard Smith, J. The Evolution of Sex (Cambridge University Press, 1978). Google Scholar  3. Avise, J. C. Clonality (Oxford University Press, 2008). Google Scholar  4. Hamilton, W. D.,…

Continue Reading Interploidy gene flow involving the sexual-asexual cycle facilitates the diversification of gynogenetic triploid Carassius fish

Personalis Senior Bioinformatics Pipeline Development Engineer

Senior Bioinformatics Pipeline Development Engineer (Remote option available) at Personalis, Inc (View all jobs) Menlo Park Personalis is a rapidly growing cancer genomics company transforming the development of next-generation therapies by providing more comprehensive molecular data about each patient’s cancer and immune response. Our ImmunoID NeXT Platform® is enabling the…

Continue Reading Personalis Senior Bioinformatics Pipeline Development Engineer

identification of ROH using plink

identification of ROH using plink 0 Hello All I generated vcf file using GATK (First Haplotypecaller –> CombinedGVCF –> GenotypeGVCF and then Hard filtering ). After this, I converted filtered vcf file into plink binary PED files (.bed, .fem, .bim, plink v1.9) using –make-bed command. However, when I used these…

Continue Reading identification of ROH using plink

Shared variants

Shared variants 1 Hello I have exome data sets from 6 individuals, in which 4 are affected and 2 are not affected. I have to identify the variants which are shared between the four affected individuals. I did the joint call genotyping for the 4 affected individuals and filtered the…

Continue Reading Shared variants

GATK HaplotypeCaller works without GVCF option, but errors with GVCF

I’ve extracted chromosome 4 from a whole genome bam file as follows: samtools view -h “$BAM” chr4 > “$EXT/temp/”$PREFIX”_chr4.sam” samtools view -bS “$EXT”/temp/$PREFIX”_chr4.sam” > “$EXT”/temp/$PREFIX”_chr4.bam” Then added read groups, as required by GATK picard AddOrReplaceReadGroups I=”$BAM” O=”$EXT”/temp/$PREFIX”_chr4_rg.bam” RGID=4 RGLB=lib1 RGPL=ILLUMINA RGPU=unit1 RGSM=20 Index the bam: samtools index “$BAM” Download the…

Continue Reading GATK HaplotypeCaller works without GVCF option, but errors with GVCF

How to generate the contigs ploidy priors table (yeast) for GATK DetermineGermlineContigPloidy –contig-ploidy-priors option ?

How to generate the contigs ploidy priors table (yeast) for GATK DetermineGermlineContigPloidy –contig-ploidy-priors option ? 1 Hi ! I was asked to determine the ploidy level and to do CNV calling on a yeast sample (Reference sequence : S. cerevisiae S288C). In order to perform CNV calling with the GATK…

Continue Reading How to generate the contigs ploidy priors table (yeast) for GATK DetermineGermlineContigPloidy –contig-ploidy-priors option ?

PGT only available for some variants in GATK .vcf

PGT only available for some variants in GATK .vcf 1 I’ve got a vcf file someone else prepared using GATK. I’m interested in the phasing information in the PGT tag e.g. 0|1. This information seems to be available for some variants, but not for others e.g. below chr1 16977 ….

Continue Reading PGT only available for some variants in GATK .vcf

Senior Bioinformatics Pipeline Development Engineer at Personalis

Senior Bioinformatics Pipeline Development Engineer (Remote option available) at Personalis, Inc (View all jobs) Menlo Park Personalis is a rapidly growing cancer genomics company transforming the development of next-generation therapies by providing more comprehensive molecular data about each patient’s cancer and immune response. Our ImmunoID NeXT Platform® is enabling the…

Continue Reading Senior Bioinformatics Pipeline Development Engineer at Personalis

GATK Mutect2 errors during basic variant calling

GATK Mutect2 errors during basic variant calling 0 I’ve just installed GATK and am trying to do some basic variant calling. However when I try and run this line gatk Mutect2 -R $REF -I “$BAM” -O “$DIR”/gatk/$PREFIX”_bwa_gatk_unfiltered.vcf” I get the error below. Reading the output, it looks like this is…

Continue Reading GATK Mutect2 errors during basic variant calling

Windows-Bases Software Packages Which Can Analyze Vcf Files

Windows-Bases Software Packages Which Can Analyze Vcf Files 6 I would like to work with VCF files. Select one person, subset one gene or chromosome or chromosome part. I tried VMware and Ubuntu and VCFtools and GATK and tabix but I run into a lot of errors. I don’t have…

Continue Reading Windows-Bases Software Packages Which Can Analyze Vcf Files

Phasing with Beagle 5.2 and no reference panel

Phasing with Beagle 5.2 and no reference panel 0 Hi everyone, I have a question about phasing with Beagle 5.2 without a reference panel. I have seen in answers in a couple other posts about Beagle that trying to phase with too few samples and no reference panel is not…

Continue Reading Phasing with Beagle 5.2 and no reference panel

False negatives -Hard filtering

False negatives -Hard filtering 0 Hello I need some suggestions in filtering the variants in the exome data. I combined all the GVCF files as one file and did joint call genotyping and created one vcf file. The variants in the file were hard-filtered. As first step to evaluate the…

Continue Reading False negatives -Hard filtering

Best practice for running GATK VQSR on X chromosome

Best practice for running GATK VQSR on X chromosome 0 According to GATK best practice, it is recommended that different VQSR models be built for SNPs and INDELs, because the annotations for high-quality SNPs and INDELs are systematically different (if I understand it correctly). Since annotations for good variants on…

Continue Reading Best practice for running GATK VQSR on X chromosome

Malformed walker argument using MarkDuplicatesSpark

Malformed walker argument using MarkDuplicatesSpark 1 I am creating my own NGS pipeline from illumina-fastq file to vcf. This is for pure learning purposes. When I run the following code everything is ok java -Xmx4000m “$javatmp” -jar “$picardpath” SortSam INPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/sam/1.sam OUTPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/bam/1_sorted.bam SORT_ORDER=coordinate COMPRESSION_LEVEL=5 java -Xmx4000m “$javatmp” -jar “$picardpath” MarkDuplicates INPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/bam/1_sorted.bam…

Continue Reading Malformed walker argument using MarkDuplicatesSpark

In the NGS pipeline, why read are sorted before marking duplicates?

In the NGS pipeline, why read are sorted before marking duplicates? 0 I am creating my own NGS pipeline (from Illumina fastq to vcf file). I am using best practices GATK and the pipeline already created in the clinical lab I am working. I have seen that the fastq is…

Continue Reading In the NGS pipeline, why read are sorted before marking duplicates?

Janis Germline Variant-Calling Workflow (GATK)

This is a genomics pipeline to do a single germline sample variant-calling, adapted from GATK Best Practice Workflow. This workflow is a reference pipeline for using the Janis Python framework (pipelines assistant). Alignment: bwa-mem Variant-Calling: GATK HaplotypeCaller Outputs the final variants in the VCF format. Resources This pipeline has been…

Continue Reading Janis Germline Variant-Calling Workflow (GATK)

Should we trust genotypes called in simple tandem repeat regions?

Should we trust genotypes called in simple tandem repeat regions? 1 Hello. I am searching genomes (WGS) or exomes (WES) of patients with rare diseases for potential disease-causing variants. The accuracy of each genotype for each patient is vital. I’m using GATK 4 to perform joint-calling of genotypes of the…

Continue Reading Should we trust genotypes called in simple tandem repeat regions?

filter Refseq file with bed? to produce coverage per gene

filter Refseq file with bed? to produce coverage per gene 0 Hi I am using GATK DepthOfCoverage -genelist mode to find the coverage per gene. The output contains many 0 coverage for genes that are not in the panel. Can I fix it by removing the genes that are not…

Continue Reading filter Refseq file with bed? to produce coverage per gene

DepthOfCoverage Error: no suitable codec found

Hi, I am trying to find the coverage per sample and per gene using GATK DepthOfCoverage. I have downloaded the refseq file as per gatk suggested. But it gave me this error: A USER ERROR has occurred: Cannot read because no suitable codecs found gatk DepthOfCoverage -R $ref -O…

Continue Reading DepthOfCoverage Error: no suitable codec found

Snakemake pipeline with Gatk GermlineCNVCaller in Case mode

Snakemake pipeline with Gatk GermlineCNVCaller in Case mode 1 Hi everyone, I am trying to set up a Snakemake pipeline for germline CNV calling with Gatk in CASE mode since I have a background ready to use . My background is fragmented in 20 shards (something like ../cohort-twenty/name_1of20-model), and the…

Continue Reading Snakemake pipeline with Gatk GermlineCNVCaller in Case mode

The provided VCF file is malformed

htsjdk.tribble.TribbleException: The provided VCF file is malformed 1 I have VCF files that I want to convert to a more readable TSV file using GATK VariantsToTable, and I also want to load in the VCF in IGV. However, when trying to do this, I get the same error for both…

Continue Reading The provided VCF file is malformed

How To Split Multiple Samples In Vcf File Generated By Gatk?

There now also is a plugin in bcftools which does the split in a single pass over the multi-sample VCF/BCF file. It does not seem to be very fast, but looks correct and there are options to do the split in custom ways. You do need to install bcftools with…

Continue Reading How To Split Multiple Samples In Vcf File Generated By Gatk?

Picard vs Samtools converting CRAM to FASTQ

Picard vs Samtools converting CRAM to FASTQ 0 I need to convert my CRAM files to FASTQ to complete an analysis. I have been trying to do this via GATK and Picard, but I have repeatedly been getting an “out of memory” error even as I have increased allocated memory…

Continue Reading Picard vs Samtools converting CRAM to FASTQ

ApplyBQSR won’t recognise output argument

I’m trying to recalibrate some bams using the following: gatk –java-options “-Xmx8g” ApplyBQSR -I $insampleID.sorted.dups.bam -R $reference –bqsr-recal-file $outsampleID.table -O $outsampleID.recal.bam Every time I try, I get the following error: A USER ERROR has occurred: Argument output was missing: Argument ‘output’ is required. The following, within the same script, was…

Continue Reading ApplyBQSR won’t recognise output argument

Genome Engineering Research Scientist

Genome Engineering Research Scientist – 94152 Organization: JG-Joint Genome Institute Lawrence Berkeley National Lab’s (LBNL, www.lbl.gov/) Environmental Genomics and Systems Biology Division (biosciences.lbl.gov/divisions/egsb/) has an opening for a Genome Engineering Research Scientist to join the team. In this exciting role, you will work as part of the Center for Advanced…

Continue Reading Genome Engineering Research Scientist

HaplotypeCaller vs DeepVariant. How to interpret the quality scores?

Variant quality scores with different variant caller: HaplotypeCaller vs DeepVariant. How to interpret the quality scores? 0 Hi, I am trying to compare variant calling outputs of GATK’s HaplotypeCaller and DeepVariant. Their raw output is very different; for example, in a WGS sample, DeepVariant called 947386 variants located on chr1,…

Continue Reading HaplotypeCaller vs DeepVariant. How to interpret the quality scores?

sciClone input vaf file?

sciClone input vaf file? 3 Dear All, Hi, I want to use sciclone on our exome sequencing data. but one thing I can’t understand that is how can I got varCount equal to 0? I have no idea about this, following data i just grep from sciclone-meta-master manuscript figure3 data…

Continue Reading sciClone input vaf file?

Color hiring Bioinformatics Scientist in Chicago, Illinois, United States

Named by Rock Health as the Best Digital Health Company to Work For , Color is a leading healthcare technology company. Color is building and delivering technology-enabled healthcare to millions of people. Through partnerships with public and private partners including governments, employers and health systems, Color’s infrastructure and software enables…

Continue Reading Color hiring Bioinformatics Scientist in Chicago, Illinois, United States

Color hiring Bioinformatics Engineer in Atlanta, Georgia, United States

Named by Rock Health as the Best Digital Health Company to Work For , Color is a leading healthcare technology company. Color is building and delivering technology-enabled healthcare to millions of people. Through partnerships with public and private partners including governments, employers and health systems, Color’s infrastructure and software enables…

Continue Reading Color hiring Bioinformatics Engineer in Atlanta, Georgia, United States

HaplotypeCaller calling mutations based on one read?

HaplotypeCaller calling mutations based on one read? 0 I’m using GATK HaplotypeCaller, via grenepipe, with the default options as specified by grenepipe except for -ploidy 1 as I am working with haploid yeast. I am seeing some mutations called based on one single read only if I am interpreting the…

Continue Reading HaplotypeCaller calling mutations based on one read?

GATK-Allele frequency

GATK-Allele frequency 0 Hi Guys, I am running GATK on bam file for variant calling. In the output file, I noticed that the Allele frequency is computed as 0.5 and 1.00. What may be the reason for this? Is it calculated correctly? VCF Allele GATK frequency • 27 views Login…

Continue Reading GATK-Allele frequency

Calculate allele frequency from many VCF files in specific locus

Calculate allele frequency from many VCF files in specific locus 1 Dear all, I have 100 VCF files (100 different samples). I would like to calculate allele frequency in specific sites. In one specific locus I have three genotypes (GATK best practices workflow): rs-xxxxx: A/A occurring in 30 samples (ref…

Continue Reading Calculate allele frequency from many VCF files in specific locus

Error with GenomeAnalysisTK.jar finding tools

I’m trying to run GATK on my machine as part of a pipeline using Phyluce. As per the instructions here (phyluce.readthedocs.io/en/v1.6.8/installation.html#why-conda), I downloaded GATK 3.7-0, activated a conda environment, and imported the GATK package into conda using the following code: *conda activate phyluce-1.7.1 *gatk-register /PATH/TO/GATK-3.7/JAR/GenomeAnalysisTK.jar** Terminal recognizes the command ‘gatk…

Continue Reading Error with GenomeAnalysisTK.jar finding tools

Soft-clipping read ends based on read group

Soft-clipping read ends based on read group 2 Is it possible to clip (soft-clip preferably) n (for example, 3) nucleotides from both ends of reads in a bam file, but only for the reads with a certain defined read group? I have merged bams for ancient DNA samples and the…

Continue Reading Soft-clipping read ends based on read group

Providence hiring Bioinformatics Scientist 1 in Portland, Oregon, United States

DescriptionProvidence is calling a Bioinformatics Scientist 1 to the Molecular Genomics Lab at Providence Office Park i n Portland, OR. This is a full-time (1.0 FTE), day shift position. This position is a hybrid role between working in the lab and working from home.Apply today! Applicants that meet qualifications will…

Continue Reading Providence hiring Bioinformatics Scientist 1 in Portland, Oregon, United States

Jobot hiring Senior Bioinformatics Scientist in Boston, Massachusetts, United States

This Jobot Job is hosted by Emily Olinger Are you a fit? Easy Apply now by clicking the “Apply” button and sending us your resume. Salary $100,000 – $150,000 per year A Bit About Us We are a leading cloud based SaaS startup in the biotech industry. Our platforms leads…

Continue Reading Jobot hiring Senior Bioinformatics Scientist in Boston, Massachusetts, United States

Is it ok to replace missing WGS calls with reference notation “0/0”?

Is it ok to replace missing WGS calls with reference notation “0/0”? 1 I called variants on 200 WGS samples, each got around 4 mil variants, however, most were unique and only 1 mil variants overlapped between most individuals. I suppose it is normal behaviour that GATK won’t output info…

Continue Reading Is it ok to replace missing WGS calls with reference notation “0/0”?

Mappability calculation based on 150 bp reads after mapping with bwa

Mappability calculation based on 150 bp reads after mapping with bwa 0 Hi, I am trying to apply some filters on whole exome sequencing data. Firstly I did the mapping using bwa and then I followed the proposed pipeline from GATK for Calling variants on cohorts of samples using the…

Continue Reading Mappability calculation based on 150 bp reads after mapping with bwa

Does Haplotypecaller of GATK find all the mutations?

Does Haplotypecaller of GATK find all the mutations? 1 Hi, I have some assembled sequences and know that some genes with specific mutations are present in them. However, when I go from fatsq to bam format and then apply haplotypecaller of GATK tool, very few of these genes are missing….

Continue Reading Does Haplotypecaller of GATK find all the mutations?

Calculating Allele Balance in GATK4

Calculating Allele Balance in GATK4 0 Hi All, I know GATK3 has option to compute Allele Balance and populate ABHet and ABHom fields. I do not see this option in GATK4. I used to run this command in GATK3: java ${JAVAOPTS} -jar /usr/local/genome/GATK-3.6-0/GenomeAnalysisTK.jar -T VariantAnnotator -A AlleleBalance -I AF1.vcf.gz -R…

Continue Reading Calculating Allele Balance in GATK4

troubleshooting benchmarking small variants: hap.py and rtg

Hi! I tried to do what other posts reported and I have a problem that I do not fully understand why … 1) I downloaded the fastq files from Garvan (ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/) with the bed file. I had to convert the bed file to hg38 (my_regions) … as I understand it…

Continue Reading troubleshooting benchmarking small variants: hap.py and rtg

Job: Senior Bioinformatics Scientist – (28180-JOB) at Illumina Singapore Pte Ltd Singapore

Job Description Basic Function and Scope of the Position: As a Sr. Bioinformatics Scientist, your primary responsibility is to enable the data analysis and processing and successfully delivery of data within a cloud system to meet project based KPIs for to client(s) in Singapore, as part of large and strategic…

Continue Reading Job: Senior Bioinformatics Scientist – (28180-JOB) at Illumina Singapore Pte Ltd Singapore

Heterozygous Variants On Male X/Y Chromosome (Exome Data)

Heterozygous Variants On Male X/Y Chromosome (Exome Data) 0 Hello, I am analyzing the Whole Exome Sequencing (WES) data of a male patient. When looking at the variants on X and Y chromosome, I find out many heterozygous variants. I think they should all be hemizygous variants. Shouldn’t they? What…

Continue Reading Heterozygous Variants On Male X/Y Chromosome (Exome Data)

How can I obtain genotypes from .bams of RNAseq data?

How can I obtain genotypes from .bams of RNAseq data? 0 Hi all, I am hoping to run an allele specific expression analysis on a set of RNAseq samples I have. I need to obtain the genotypes for all samples to determine heterozygosity of each variant which is needed for…

Continue Reading How can I obtain genotypes from .bams of RNAseq data?

pooled-heterozygosity calculation

pooled-heterozygosity calculation 0 As Rubin et al, one method of selection signature identification in a genome-scale study is pooled heterozygosity (Hp) calculation. “Hp = 2ƩnMAJƩnMIN/( ƩnMAJ + ƩnMIN)^2, where nMAJ and nMIN are the numbers of reads corresponding to the most and least abundant allele, respectively, the sum of theses…

Continue Reading pooled-heterozygosity calculation

Best way to fill VCF with ancestral allele info AA for each SNP hg19

Best way to fill VCF with ancestral allele info AA for each SNP hg19 1 Hi all, I am currently working with some full-genome human sequence data (mapped to hg19) and created VCF (called only SNPS) files from my BAM files with gatk. For each SNP in my VCF I…

Continue Reading Best way to fill VCF with ancestral allele info AA for each SNP hg19

Bioconductor – Bioconductor 3.14 Released

Home Bioconductor 3.14 Released October 27, 2021 Bioconductors: We are pleased to announce Bioconductor 3.14, consisting of 2083 software packages, 408 experiment data packages, 904 annotation packages, 29 workflows and 8 books. There are 89 new software packages, 13 new data experiment packages, 10 new annotation packages, 1 new workflow,…

Continue Reading Bioconductor – Bioconductor 3.14 Released

Identify most and least abundant allele (nMAJ and nMIN) for pooled heterozygosity (hp) analysis from vcf file

Identify most and least abundant allele (nMAJ and nMIN) for pooled heterozygosity (hp) analysis from vcf file 1 I am trying to calculate Pooled Heterozygosity (hp) by identifying nMAJ and nMIN from vcf file with a sliding window 150kb. I am confused after reading papers where they calculated using formula…

Continue Reading Identify most and least abundant allele (nMAJ and nMIN) for pooled heterozygosity (hp) analysis from vcf file

gatk GetPileupSummaries and CalculateContamination result in NaN on mouse data

Hello! I ran gatk toolchain including CalculateContamination in galaxy on human exome sequencing data, and it worked fine. However when i try feeding it with murine data (and murine reference files), CalculateContamination gives me this contamination table: sample contamination error mouse1_tumor NaN 1.0 And being fed with this result, FilterMutectCalls…

Continue Reading gatk GetPileupSummaries and CalculateContamination result in NaN on mouse data

Bioinformatics Analyst II – Remote at Geisinger Health System

Job Summary Primary accountability is to leverage the organization’s data assets exome sequencing data (>180,000 individuals) from MyCode Community Health Initiative to improve quality, efficiency and generate knowledge specifically in the field of bioinformatics within health research. Performs and supervises complex data extraction, transformation, visualization, and summarization to support Research…

Continue Reading Bioinformatics Analyst II – Remote at Geisinger Health System

VQSRTrancheSNP

VQSRTrancheSNP 0 Does anyone have a guide on how to decide on the VQSRTrancheSNP level to use? I have variants that are in the VQSRTrancheSNP99.80-99.90, VQSRTrancheSNP99.90to100.00, and VQSRTrancheSNP97.80to98.00. I investigated one of the variants flagged with the VQSRTrancheSNP97.80to98.00 filter and seems to be a real variant. Variant calling was completed…

Continue Reading VQSRTrancheSNP

Lead Data Scientist – Bioinformatics job with Spark Therapeutics

Primary Duties The Bioinformatics Group within the Data Science Organization at Spark Therapeutics is seeking an engaged and passionate Lead Data Scientist with a focus on bioinformatics and computational biology to participate in and support projects involving omics and other high-dimensional data across the Technology & Research Organizations. He/she will…

Continue Reading Lead Data Scientist – Bioinformatics job with Spark Therapeutics

samples cannot be empty when i run gatk-package-4.1.4.1-local.jar HaplotypeCaller

java.lang.IllegalArgumentException: samples cannot be empty when i run gatk-package-4.1.4.1-local.jar HaplotypeCaller 3 hello, when i run gatk, the error always occur, like this “java.lang.IllegalArgumentException: samples cannot be empty” is there mistake in my input file? thank you for your help ! my bam file as following: HWI-EAS418:3:37:1070:1462 83 chr20 46689301 255…

Continue Reading samples cannot be empty when i run gatk-package-4.1.4.1-local.jar HaplotypeCaller

gatk legacy bundles (where to get Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz)

gatk legacy bundles (where to get Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz) 0 I need the known indels vcf to run gatk BaseRecalibrator. So I need hg19 (not the b37) version of the known indels: Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz However this file is no longer available at: ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz the broad institute documentation says the ftp site was disabled…

Continue Reading gatk legacy bundles (where to get Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz)

ABBA BABA test for two population

ABBA BABA test for two population 0 I generated a VCF file including 20 samples belonging to two species (A and B), variant calling was done using GATK best practice pipeline (only autosomes chromosomes). Now I want to detect the introgressed regions from A sp. to B sp. I searched…

Continue Reading ABBA BABA test for two population

? in cram files

I have an original bam file that when compressed to cram format, the quality encoding scores are lost and replaced as question marks ? and other symbols. The following questions come up: Why is it that the original base quality scores are changed when compressing from bam to cram? Is…

Continue Reading ? in cram files

GATK build

GATK build 0 when I type “./gradlew bundle” to build GATK, the error was shown in the screen What went wrong: A problem occurred evaluating root project ‘gatk’. Execution of “git lfs pull –include src/main/resources/large” failed with exit code: 1. git-lfs is required to build GATK but may not be…

Continue Reading GATK build

Junior Bioinformatician at the Lymphoma Genomic Laboratory

Junior Bioinformatician at the Lymphoma Genomic Laboratory Open position Junior Bioinformatician at the Lymphoma Genomic Laboratory Institute of Oncology Research (IOR) Bellinzona, Switzerland www.ior.usi.ch www.ior.usi.ch The Institute of Oncology Research (IOR) in Bellinzona, Switzerland, is a rapidly evolving,leading center for basic and translational research in oncology in Europe.IOR is affiliated…

Continue Reading Junior Bioinformatician at the Lymphoma Genomic Laboratory

Sample not found in BAM header

GATK mutech2: Sample not found in BAM header 0 I get the following error when running mutech2, any idea what the reason is: A USER ERROR has occurred: Bad input: Sample N-PANCNGS-006 is not in BAM header: [] gatk Mutect2 –native-pair-hmm-threads 30 -R ~/genomes/BWA/Homo_sapiens.GRCh38.dna.primary_assembly.fa -I T-PANCNGS-006.bam -I N-PANCNGS-006.bam -normal N-PANCNGS-006…

Continue Reading Sample not found in BAM header

when should I use VQSR or hard filtering?

GATK site filtering: when should I use VQSR or hard filtering? 0 I’m always unsure whether to use VQSR or hard filtering when I do site filtering after joint-calling using GATK. One reasonable criterion I can think of is to inspect the structure of the VQSR model (that 2D heatmap…

Continue Reading when should I use VQSR or hard filtering?

Children’s Hospital of Philadelphia hiring Bioinformatics Scientist II – DBHI in Philadelphia, Pennsylvania, United States

Location: LOC_HOME-Home/Remote Office Location Req ID: 150400 Shift: Days Employment Status: Regular – Full Time Job Summary The Children’s Hospital of Philadelphia (CHOP) Research Institute and its Department of Biomedical and Health Informatics (DBHi) are seeking a bioinformatics scientist to help advance an enterprise-level data and informatics platform called “Arcus”….

Continue Reading Children’s Hospital of Philadelphia hiring Bioinformatics Scientist II – DBHI in Philadelphia, Pennsylvania, United States

Base recalibration in normal vs. tumor somatic variant calling in WXS data?

Base recalibration in normal vs. tumor somatic variant calling in WXS data? 0 Hi there, I have a tumor and a normal BAM file and am preparing to run base recalibration. I was planning on calling variants on the normal and using that, in addition to dbSNP, as input for…

Continue Reading Base recalibration in normal vs. tumor somatic variant calling in WXS data?

HsMetricCollector (gatk 4.0.5.0 API)

HsMetricCollector (gatk 4.0.5.0 API) JavaScript is disabled on your browser. public class HsMetricCollector extends TargetMetricsCollector<HsMetrics> Calculates HS metrics for a given SAM or BAM file. Requires the input of a list of target intervals and a list of bait intervals. Can be invoked either on an entire iterator of SAMRecords…

Continue Reading HsMetricCollector (gatk 4.0.5.0 API)

High positive values in vcftools –het results

High positive values in vcftools –het results 0 Hello all, I am trying to find out the inbreeding coefficients of the individuals in my sample using vcftools –het filter. I am seeing very high positive values (as seen in the figure below). What does very high positive values mean? Is…

Continue Reading High positive values in vcftools –het results

GATK multiple files run error /usr/bin/bash: gatk: command not found

GATK multiple files run error /usr/bin/bash: gatk: command not found 0 Hello I’m using ls *.sorted_markduplicates.bam | parallel –progress –eta -j 3 ‘gatk BaseRecalibrator -I {} -R ../0.Reference/CH-PICR.fasta -O {.}.recal.bam’ to run multiple Bam files. But an error has occurred like this: Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete…

Continue Reading GATK multiple files run error /usr/bin/bash: gatk: command not found

replace low quality bases with “N”

Bam file: replace low quality bases with “N” 1 Hello, is there any way to remove low quality score bases from bam files, leaving the space blank or replaced with a filler character (i.e. Ns)? I have mapped my sequencing reads, and adjusted the base quality scores (using mapDamage, which…

Continue Reading replace low quality bases with “N”

dna seq analysis – Banya

Dna Sequencing Data Analysis Simple Software Tools . Omicscript Pipeline For Dna Seq Data Analysis Array Suite Wiki . Dna Sequencing Data Analysis Simple Software Tools . Omicscript Pipeline For Dna Seq Data Analysis Array Suite Wiki . Dna Sequence Alignment Dna Contig Assembly Software Sequence . Dna Sequence Alignment…

Continue Reading dna seq analysis – Banya

Did not inflate expected amount error

SAMFormatException: Did not inflate expected amount error 0 I am facing an error while running GATK Basecalibrator. [1 July 2021 at 5:00:42 PM IST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 35.00 minutes. Runtime.totalMemory()=791674880 htsjdk.samtools.SAMFormatException: Did not inflate expected amount at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:147) at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96) at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550) at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532) at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468) at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458) at…

Continue Reading Did not inflate expected amount error

Comparison of sequencing data processing pipelines and application to underrepresented African human populations | BMC Bioinformatics

Literature survey We reviewed the processing pipelines of 29 HTS studies, 23 of which focus on human populations and six on other mammals (listed in Table 1). Table 1 List of studies included in the literature survey We summarized the information for some processing steps in Table 2 (see Additional…

Continue Reading Comparison of sequencing data processing pipelines and application to underrepresented African human populations | BMC Bioinformatics

Bioinformatics Scientist Job Opening in Huntsville, AL at Discovery Life Sciences

Discovery Life Sciences are the Biospecimen and Biomarker Specialists™ serving thousands of scientists across the U.S. and around the world. We’re looking for a standout Bioinforatics Scientist for the HudsonAlpha Discovery sequencing lab in Huntsville, Alabama. The Bioinformatics Scientist contributes to our mission by taking lab-generated data and creating computational…

Continue Reading Bioinformatics Scientist Job Opening in Huntsville, AL at Discovery Life Sciences

Single-cell DNA sequencing on Pediatric MDS

Study Description Single-cell DNA sequencing with antibody-oligonucleotide staining was performed using the Mission Bio Tapestri single-cell DNA sequencing platform, per the manufacturer’s instructions. All libraries were sized and quantified using an Agilent Bioanalyzer and pooled for sequencing on an Illumina NovaSeq6000 with 150 base-paired ending multiplexed runs. Fastq files generated…

Continue Reading Single-cell DNA sequencing on Pediatric MDS

The genome of Shorea leprosula (Dipterocarpaceae) highlights the ecological relevance of drought in aseasonal tropical rainforests

Sequencing of Shorea leprosula genome Sample collection Leaf samples of S. leprosula were obtained from a reproductively mature (diameter at breast height, 50 cm) diploid tree B1_19 (DNA ID 214) grown in the Dipterocarp Arboretum, Forest Research Institute Malaysia (FRIM). DNA extraction Genomic DNA was extracted from leaf samples using the…

Continue Reading The genome of Shorea leprosula (Dipterocarpaceae) highlights the ecological relevance of drought in aseasonal tropical rainforests

Using PoolSNP to return non-SNP genotypes

Using PoolSNP to return non-SNP genotypes 0 I know that PoolSNP is optimized for variant calling, but surely there’s some way to get to to return a vcf with allele frequency counts at all sites rather than just the SNPs, as can be done with GATK. Is there something I…

Continue Reading Using PoolSNP to return non-SNP genotypes

variants only found on inversion reads in IGV

Forum:variants only found on inversion reads in IGV 0 Hi, I used GATK germline variant calling pipeline to call short variants on paired end fastq files. After got the final analysis ready vcf, applied some extra filters, I inspected bam files in IGV for those variants of interest and found…

Continue Reading variants only found on inversion reads in IGV

Crossing design shapes patterns of genetic variation in synthetic recombinant populations of Saccharomyces cerevisiae

Population creation All yeast strains used in this study originated from heterothallic, haploid, barcoded derivatives of the SGRP yeast strain collection30. A subset of 12 of these haploid strains, originally isolated from distinct geographic locations worldwide, were used to create the synthetic populations we describe here (See Supplementary Fig. S1…

Continue Reading Crossing design shapes patterns of genetic variation in synthetic recombinant populations of Saccharomyces cerevisiae

Mutational profile of the dystrophin gene

Introduction Duchenne Muscular Dystrophy (DMD-OMIM #310200) and Becker Muscular Dystrophy (BMD-OMIM #300376), are the most common hereditary muscular dystrophies around the world.1 DMD and BMD occur with a frequency of 1/3.300 and 1/6.000 newborn males, respectively.2 These dystrophinopathies are caused by alterations in the DMD gene and have an X-linked…

Continue Reading Mutational profile of the dystrophin gene

Is it okay to convert bam files to fastq and getting seqkit results?

Is it okay to convert bam files to fastq and getting seqkit results? 0 I’m sorry if this is a non-sense question but I am a rookie and although I searched about this I couldn’t find any decent answer. I had the fastq files but they somehow became corrupted. I…

Continue Reading Is it okay to convert bam files to fastq and getting seqkit results?

genomic data scientist jobs

Provide strategic planning and perform analysis or simulations independently or in a . 401(k) savings plan match.…, Requires a Ph.D. in Biochemistry, Biotechnology, Molecular/Cell Biology, Plant Biology, or a related field and 0-3 years of relevant postdoctoral or industrial……, In addition, the analyst will help advance the groups collective expertise…

Continue Reading genomic data scientist jobs

Job vacancy in Global Worldwide: Bioinformatics Analyst II – Remote at Geisinger

Job details Job type full-time Full job description Job summary Primary accountability is to leverage the organization’s data assets exome sequencing data (>180,000 individuals) from mycode community health initiative to improve quality, efficiency and generate knowledge specifically in the field of bioinformatics within health researchPerforms and supervises complex data extraction,…

Continue Reading Job vacancy in Global Worldwide: Bioinformatics Analyst II – Remote at Geisinger

biopython extract sequence from fasta

My two questions are: What is the simplest way to do this? This unique book shows you how to program with Python, using code examples taken directly from bioinformatics. using python-bloom-filter, just replace the set with seen = BloomFilter(max_elements=10000, error_rate=0.001). This book is suitable for use as a classroom textbook,…

Continue Reading biopython extract sequence from fasta

Difference in .bed target region base count and mapped base count for different tools

Difference in .bed target region base count and mapped base count for different tools 1 Hello! I was initially doing performance comparison for tools to obtain sample wide average depth of coverage. I compared several tools listed below Qualimap, GATK – DepthOfCoverage, Mosdepth and Samtools – Bedcov However, the tools…

Continue Reading Difference in .bed target region base count and mapped base count for different tools

Identifying Private SNPs between multi sample vcf files.

Identifying Private SNPs between multi sample vcf files. 0 Dear Community, Hope all is well. I am having difficulty finding the best way to quantify Private SNPs between my multi sample VCF files. For example, I have 110 samples in my VCF file that I generated via CohortCalling using GATK….

Continue Reading Identifying Private SNPs between multi sample vcf files.

Bioinformatics Analyst II – Remote job in Danville at Geisinger

Job Title Bioinformatics Analyst II – Remote Location Work from Home Job Category Information Technology Support Services Schedule Days Work Type Full time Department Research Informatics Department Date posted 09/22/2021 Job ID R-15599 Job Summary Primary accountability is to leverage the organization’s…

Continue Reading Bioinformatics Analyst II – Remote job in Danville at Geisinger

Strange speed up in GATK LeftAlignIndels

Strange speed up in GATK LeftAlignIndels 1 Hi! I noticed a strange thing, I have been running a DNA-seq pipeline like this: reads -> bwa-mem2 -> picard SortSam -> picard MergeSamFiles -> picard MarkDuplicates -> gatk LeftAlignIndels … gatk LeftAlignIndels has always taken around 4 hours to complete with the…

Continue Reading Strange speed up in GATK LeftAlignIndels