So many variants detected.
Dear All,
I have done variant calling in Germline data that has single sample of each individual and two genes.
I did following steps, but after checking results I found too many variants.
After Haplotypecaller (the step 6) I found 140900 known variants, and the steps 7 and 8 didn’t filter any variants.
What should be causing this result?
1. index the reference fasta (hg19.fa)
bwa index hg19.fa
2. mapping to the reference genome (hg19.fa)
I randomly gave read group information.
bwa mem -t 16 -R '@RGtID:21002tSM:Sample_21002' hg19.fa ../../Trim_final/21002.1.FASTP.fastq.gz ../../Trim_final/21002.2.FASTP.fastq.gz > 21002.bam
3. picard sort bam file by queryname
java -jar ~/applications/picard-tools-2.23.0/picard.jar SortSam I=21002.bam O=21002.bam.PICARD.output.sorted.queryname.bam SORT_ORDER=queryname
4. mark duplicates by MarkDuplicatesSpark
gatk --java-options "-Xmx5g" MarkDuplicatesSpark -I 21002.bam.PICARD.output.sorted.queryname.bam -O 21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam --tmp-dir ./Test/Tmps/
5. Base (Quality Score) Recalibration
Tools involved: BaseRecalibrator, Apply Recalibration
##first run BaseRecalibrator and later run ApplyBQSR using a table produced by BaseRecalibrator and a bam file produced at the step4.
###make an index of dbsnp_138.hg19.vcf
gatk IndexFeatureFile -F dbsnp_138.hg19.vcf ## this will generate dbsnp_138.hg19.vcf.idx
##### add read group and platform to bam files;
java -jar ~/applications/picard-tools-2.23.0/picard.jar AddOrReplaceReadGroups I=21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam O=21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam.read.groud.bam RGID=21002 RGLB=21002lib1 RGPL=ILLUMINA RGPU=21002 RGSM=20
#####BaseRecalibrator
gatk BaseRecalibrator -R hg19.fa -I 21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam.read.groud.bam --known-sites dbsnp_138.hg19.vcf --known-sites ../Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.EDITED.gz -O 21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam.read.groud.bam.recal_data.table
#####ApplyBQSR
gatk ApplyBQSR -R hg19.fa -I 21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam.read.groud.bam --bqsr-recal-file 21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam.read.groud.bam.recal_data.table -O 21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam.read.groud.bam.ApplyBQSR.OUTPUT.bam
6. HaplotypeCaller (single sample mode)
gatk HaplotypeCaller --bam-output 21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam.read.groud.bam.ApplyBQSR.OUTPUT.HaplotypeCaller.OUTPUT.bam --native-pair-hmm-threads 64 -R hg19.fa -I 21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam.read.groud.bam.ApplyBQSR.OUTPUT.bam -O 21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam.read.groud.bam.ApplyBQSR.OUTPUT.bam.HaplotypeCaller.output.NO.GVCF.option.vcf.gz -D dbsnp_138.hg19.vcf
7. CNNScoreVariants
gatk CNNScoreVariants -R hg19.fa -V 21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam.read.groud.bam.ApplyBQSR.OUTPUT.bam.HaplotypeCaller.output.NO.GVCF.option.vcf.gz -O 21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam.read.groud.bam.ApplyBQSR.OUTPUT.bam.HaplotypeCaller.output.NO.GVCF.option.vcf.gz.CNNScoreVariants.OUTPUT.vcf
8. FilterVariantTranches
gatk --java-options "-Xmx7g" FilterVariantTranches -V 21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam.read.groud.bam.ApplyBQSR.OUTPUT.bam.HaplotypeCaller.output.NO.GVCF.option.vcf.gz.CNNScoreVariants.OUTPUT.vcf --resource ../Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.EDITED.gz --resource ../dbsnp_138.hg19.vcf --info-key CNN_1D --snp-tranche 99.95 --indel-tranche 99.4 -O 21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam.read.groud.bam.ApplyBQSR.OUTPUT.bam.HaplotypeCaller.output.NO.GVCF.option.vcf.gz.CNNScoreVariants.OUTPUT.vcf.FilterVariantTranches.OUTPUT.vcf
• 12 views
Traffic: 2373 users visited in the last hour