Categories
Tag: GenomicsDBImport
GATK GenomicsDBImport too slow
GATK GenomicsDBImport too slow 1 Hello, I have 3264 g.VCFs and an interval list for the reference genome that contains 20000 contigs. The interval list looks like the following: utg19_pilon_pilon:1-42237 utg22_pilon_pilon:1-49947 utg24_pilon_pilon:1-61707 utg30_pilon_pilon:1-459006 utg38_pilon_pilon:1-129173 utg40_pilon_pilon:1-101813 utg58_pilon_pilon:1-143918 utg93_pilon_pilon:1-186249 utg100_pilon_pilon:1-87875 utg104_pilon_pilon:1-49315 I am running the GATK GenomicsDBImport command as follows: gatk –java-options…
How to input list into GenomicsDBImport with snakemake?
How to input list into GenomicsDBImport with snakemake? 0 Hello! I’m currently writing a pipeline with snakemake for exome data. During joint variant calling I need to use GATK’s GenomicsDBImport, although I’m unsure how to input all the samples at once. Here’s the simplified version of the rule I’m using:…
Samtools index not working in Snakemake
I am setting up a Snakemake pipeline for sequencing reads alignment and variants calling. But the samtools index rule is not activated, and the subsequent haplotype caller rule fail. I think it is because the samtools index rule is not perceived as necessary to execute the output of rule all…
GenotypeGVCF too many genotypes from pooled samples
Hello, I am trying to create a VCF file using GentypeGVCFs in GATK4. I have 60 samples and each sample is pooled data. The ploidy per sample is 60. This is due to the biological system I work in. This data has been processed in Haplotypecaller, below is an example…
GenomicsDBImport from Mutect2 output
I am trying to follow GATK 4.2.0 best-practice guidelines for Mutect2 PoN creation. I called variants in my samples as recommended with: gatk Mutect2 \ -R ${REF} \ -L ${EXOME_INPUT_INTERVALS} \ -I ${BAM} \ –sequence-dictionary ${DICT} \ –max-mnp-distance 0 \ -O ${SAMPLE_NAME}.mutect2.vcf but I see that the tool is unable…
How to extract phased haplotypes from GATK HaplotypeCaller
I would like to extract the physically phased haplotypes from a VCF file generated by GATK’s HaplotypeCaller on Illumina data of some isolates from different yeast (S. cerevisiae) strains. According to this FAQ: In the format field of a PGT (Pre-Implantation Genetic Testing) VCF, you may find a description similar…
Whole-genome sequencing reveals an association between small genomic deletions and an increased risk of developing Parkinson’s disease
Case selection In this prospective case‒control study, we enrolled PD patients and healthy controls at Asan Medical Center (AMC), Seoul, South Korea, between 2018 and 2020. PD diagnosis was based on the UK PD Society Brain Bank criteria15. Batch 1 (n = 210) and 2 (n = 100) PD cohorts were recruited from January…
GenomicsDBImport datastore format folder permissions
Bug Report Affected tool(s) or class(es) GenomicsDBImport / GenotypeGVCFs Affected version(s) 4.3.0.0 Description When creating a GenomicsDB datastore, the created folder has permissions set to 700 (recursivelly).As such, when trying to jointly calling genotypes using the GenotypeGVCFs, one encounters error:ERROR: Couldn’t create GenomicsDBFeatureReader Steps to reproduce Create a datastore using…
Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment
Sample preparation We ordered the GIAB samples from the Coriell Institute (NA24385, NIST ID HG002; NA24149, NIST-ID HG003 and NA24143, NIST-ID HG004). DNA concentration was measured by Qubit. The library was constructed according to Illumina TruSeq DNA PCR Free Library Prep protocol HT (Illumina Inc., San Diego, CA, USA) for…
GATK’s GenomicsDBImport takes forever…
GATK’s GenomicsDBImport takes forever… 0 Hello! I have 90 samples in the form of vcf files, together they are a few terabytes in size. I wish to create a single multi-sample vcf file for downstream analysis. I am trying to use GenomicsDBImport for this, but it just takes too long…
Contig chr1 given as location, but this contig isn’t present in the Fasta sequence dictionary
Badly formed genome unclippedLoc: Contig chr1 given as location, but this contig isn’t present in the Fasta sequence dictionary 2 Hi everyone, I’m trying to run Mutect2 for WES cancer data. However, since their Resource bundle only supports h19 seems I cannot proceed (I want to compare it with Strelka2…