I’ve extracted chromosome 4 from a whole genome bam file as follows:
samtools view -h "$BAM" chr4 > "$EXT/temp/"$PREFIX"_chr4.sam"
samtools view -bS "$EXT"/temp/$PREFIX"_chr4.sam" > "$EXT"/temp/$PREFIX"_chr4.bam"
Then added read groups, as required by GATK
picard AddOrReplaceReadGroups I="$BAM" O="$EXT"/temp/$PREFIX"_chr4_rg.bam" RGID=4 RGLB=lib1 RGPL=ILLUMINA RGPU=unit1 RGSM=20
Index the bam:
samtools index "$BAM"
Download the reference chromosome 4, index it and create the dictionary for gatk:
curl https://hgdownload.soe.ucsc.edu/goldenPath/hg19/chromosomes/chr4.fa.gz | gunzip -c > ~/refs/hg19/chr4.fa
samtools faidx $REF
bwa index $REF
bowtie2-build $REF $REF
picard CreateSequenceDictionary REFERENCE="$REF" OUTPUT=~/refs/hg19/$ACC.dict
Then I run the normal GATK variant calling with this, which works.
gatk HaplotypeCaller -R $REF -I "$BAM" -O "$DIR"/gatk/$PREFIX"_chr4_HaplotypeCaller.vcf"
But when I run it with the GVCF option, I get the error below:
gatk HaplotypeCaller -R $REF -I "$BAM" -O "$DIR"/gatk/$PREFIX"_chr4_HaplotypeCallerPGT.vcf" -ERC GVCF
The important line in the error seems to be:
A USER ERROR has occurred: Contig: chrM not found in reference dictionary.
Please check that you are using a compatible reference for your data.
Reference Contigs: [chr4]
But here is the full error:
$ gatk HaplotypeCaller -R $REF -I "$BAM" -O "$DIR"/gatk/$PREFIX"_chr4_HaplotypeCallerPGT.vcf" -ERC GVCF
Using GATK jar /Users/michaelflower/opt/anaconda3/envs/gatk4/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /Users/michaelflower/opt/anaconda3/envs/gatk4/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar HaplotypeCaller -R /Users/michaelflower/refs/hg19/chr4.fa -I /Volumes/Seagate Expansion Drive/temp/125QiPSC_chr4_rg.bam -O /Users/michaelflower/Documents/ACL/Research/Projects/HTT ASO/2021.11.17 125Q vcf query/gatk/125QiPSC_chr4_HaplotypeCallerPGT.vcf -ERC GVCF
15:51:48.424 INFO NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/michaelflower/opt/anaconda3/envs/gatk4/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.dylib
Nov 17, 2021 3:51:48 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:51:48.827 INFO HaplotypeCaller - ------------------------------------------------------------
15:51:48.828 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.2.3.0
15:51:48.828 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
15:51:48.828 INFO HaplotypeCaller - Executing as michaelflower@Michaels-MacBook-Pro-2.local on Mac OS X v10.16 x86_64
15:51:48.828 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_302-b08
15:51:48.828 INFO HaplotypeCaller - Start Date/Time: 17 November 2021 15:51:48 GMT
15:51:48.829 INFO HaplotypeCaller - ------------------------------------------------------------
15:51:48.829 INFO HaplotypeCaller - ------------------------------------------------------------
15:51:48.829 INFO HaplotypeCaller - HTSJDK Version: 2.24.1
15:51:48.829 INFO HaplotypeCaller - Picard Version: 2.25.4
15:51:48.829 INFO HaplotypeCaller - Built for Spark Version: 2.4.5
15:51:48.829 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:51:48.830 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:51:48.830 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:51:48.830 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:51:48.830 INFO HaplotypeCaller - Deflater: IntelDeflater
15:51:48.830 INFO HaplotypeCaller - Inflater: IntelInflater
15:51:48.830 INFO HaplotypeCaller - GCS max retries/reopens: 20
15:51:48.830 INFO HaplotypeCaller - Requester pays: disabled
15:51:48.830 INFO HaplotypeCaller - Initializing engine
15:51:49.333 INFO HaplotypeCaller - Done initializing engine
15:51:49.335 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
15:51:49.341 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
15:51:49.341 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
15:51:49.359 INFO NativeLibraryLoader - Loading libgkl_utils.dylib from jar:file:/Users/michaelflower/opt/anaconda3/envs/gatk4/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar!/com/intel/gkl/native/libgkl_utils.dylib
15:51:49.500 WARN NativeLibraryLoader - Unable to find native library: native/libgkl_pairhmm_omp.dylib
15:51:49.500 INFO PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
15:51:49.501 INFO NativeLibraryLoader - Loading libgkl_pairhmm.dylib from jar:file:/Users/michaelflower/opt/anaconda3/envs/gatk4/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm.dylib
15:51:49.712 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
15:51:49.713 WARN IntelPairHmm - Ignoring request for 4 threads; not using OpenMP implementation
15:51:49.713 INFO PairHMM - Using the AVX-accelerated native PairHMM implementation
15:51:49.780 INFO ProgressMeter - Starting traversal
15:51:49.781 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
15:51:49.834 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
15:51:49.834 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
15:51:49.834 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
15:51:49.835 INFO HaplotypeCaller - Shutting down engine
[17 November 2021 15:51:49 GMT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=493879296
***********************************************************************
A USER ERROR has occurred: Contig: chrM not found in reference dictionary.
Please check that you are using a compatible reference for your data.
Reference Contigs: [chr4]
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
(gatk4)
Read more here: Source link