Tag: GenomicsDBImport

GATK GenomicsDBImport too slow

GATK GenomicsDBImport too slow 1 Hello, I have 3264 g.VCFs and an interval list for the reference genome that contains 20000 contigs. The interval list looks like the following: utg19_pilon_pilon:1-42237 utg22_pilon_pilon:1-49947 utg24_pilon_pilon:1-61707 utg30_pilon_pilon:1-459006 utg38_pilon_pilon:1-129173 utg40_pilon_pilon:1-101813 utg58_pilon_pilon:1-143918 utg93_pilon_pilon:1-186249 utg100_pilon_pilon:1-87875 utg104_pilon_pilon:1-49315 I am running the GATK GenomicsDBImport command as follows: gatk –java-options…

Continue Reading GATK GenomicsDBImport too slow

How to input list into GenomicsDBImport with snakemake?

How to input list into GenomicsDBImport with snakemake? 0 Hello! I’m currently writing a pipeline with snakemake for exome data. During joint variant calling I need to use GATK’s GenomicsDBImport, although I’m unsure how to input all the samples at once. Here’s the simplified version of the rule I’m using:…

Continue Reading How to input list into GenomicsDBImport with snakemake?

Samtools index not working in Snakemake

I am setting up a Snakemake pipeline for sequencing reads alignment and variants calling. But the samtools index rule is not activated, and the subsequent haplotype caller rule fail. I think it is because the samtools index rule is not perceived as necessary to execute the output of rule all…

Continue Reading Samtools index not working in Snakemake

GenotypeGVCF too many genotypes from pooled samples

Hello, I am trying to create a VCF file using GentypeGVCFs in GATK4. I have 60 samples and each sample is pooled data. The ploidy per sample is 60. This is due to the biological system I work in. This data has been processed in Haplotypecaller, below is an example…

Continue Reading GenotypeGVCF too many genotypes from pooled samples

GenomicsDBImport from Mutect2 output

I am trying to follow GATK 4.2.0 best-practice guidelines for Mutect2 PoN creation. I called variants in my samples as recommended with: gatk Mutect2 \ -R ${REF} \ -L ${EXOME_INPUT_INTERVALS} \ -I ${BAM} \ –sequence-dictionary ${DICT} \ –max-mnp-distance 0 \ -O ${SAMPLE_NAME}.mutect2.vcf but I see that the tool is unable…

Continue Reading GenomicsDBImport from Mutect2 output

How to extract phased haplotypes from GATK HaplotypeCaller

I would like to extract the physically phased haplotypes from a VCF file generated by GATK’s HaplotypeCaller on Illumina data of some isolates from different yeast (S. cerevisiae) strains. According to this FAQ: In the format field of a PGT (Pre-Implantation Genetic Testing) VCF, you may find a description similar…

Continue Reading How to extract phased haplotypes from GATK HaplotypeCaller

Whole-genome sequencing reveals an association between small genomic deletions and an increased risk of developing Parkinson’s disease

Case selection In this prospective case‒control study, we enrolled PD patients and healthy controls at Asan Medical Center (AMC), Seoul, South Korea, between 2018 and 2020. PD diagnosis was based on the UK PD Society Brain Bank criteria15. Batch 1 (n = 210) and 2 (n = 100) PD cohorts were recruited from January…

Continue Reading Whole-genome sequencing reveals an association between small genomic deletions and an increased risk of developing Parkinson’s disease

GenomicsDBImport datastore format folder permissions

Bug Report Affected tool(s) or class(es) GenomicsDBImport / GenotypeGVCFs Affected version(s) 4.3.0.0 Description When creating a GenomicsDB datastore, the created folder has permissions set to 700 (recursivelly).As such, when trying to jointly calling genotypes using the GenotypeGVCFs, one encounters error:ERROR: Couldn’t create GenomicsDBFeatureReader Steps to reproduce Create a datastore using…

Continue Reading GenomicsDBImport datastore format folder permissions

Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment

Sample preparation We ordered the GIAB samples from the Coriell Institute (NA24385, NIST ID HG002; NA24149, NIST-ID HG003 and NA24143, NIST-ID HG004). DNA concentration was measured by Qubit. The library was constructed according to Illumina TruSeq DNA PCR Free Library Prep protocol HT (Illumina Inc., San Diego, CA, USA) for…

Continue Reading Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment

GATK’s GenomicsDBImport takes forever…

GATK’s GenomicsDBImport takes forever… 0 Hello! I have 90 samples in the form of vcf files, together they are a few terabytes in size. I wish to create a single multi-sample vcf file for downstream analysis. I am trying to use GenomicsDBImport for this, but it just takes too long…

Continue Reading GATK’s GenomicsDBImport takes forever…

Contig chr1 given as location, but this contig isn’t present in the Fasta sequence dictionary

Badly formed genome unclippedLoc: Contig chr1 given as location, but this contig isn’t present in the Fasta sequence dictionary 2 Hi everyone, I’m trying to run Mutect2 for WES cancer data. However, since their Resource bundle only supports h19 seems I cannot proceed (I want to compare it with Strelka2…

Continue Reading Contig chr1 given as location, but this contig isn’t present in the Fasta sequence dictionary