So many variants detected.

So many variants detected.


Dear All,

I have done variant calling in Germline data that has single sample of each individual and two genes.

I did following steps, but after checking results I found too many variants.

After Haplotypecaller (the step 6) I found 140900 known variants, and the steps 7 and 8 didn’t filter any variants.

What should be causing this result?

1. index the reference fasta (hg19.fa)

    bwa index hg19.fa

2. mapping to the reference genome (hg19.fa)
   I randomly gave read group information. 

    bwa mem -t 16 -R '@RGtID:21002tSM:Sample_21002' hg19.fa ../../Trim_final/21002.1.FASTP.fastq.gz  ../../Trim_final/21002.2.FASTP.fastq.gz > 21002.bam

3. picard sort bam file by queryname

     java -jar ~/applications/picard-tools-2.23.0/picard.jar SortSam I=21002.bam  O=21002.bam.PICARD.output.sorted.queryname.bam SORT_ORDER=queryname

4. mark duplicates by MarkDuplicatesSpark

    gatk --java-options "-Xmx5g" MarkDuplicatesSpark -I 21002.bam.PICARD.output.sorted.queryname.bam -O 21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam --tmp-dir ./Test/Tmps/

5. Base (Quality Score) Recalibration

    Tools involved: BaseRecalibrator, Apply Recalibration

    ##first run BaseRecalibrator and later run ApplyBQSR using a table produced by BaseRecalibrator and a bam file produced at the step4.

    ###make an index of dbsnp_138.hg19.vcf

    gatk IndexFeatureFile -F  dbsnp_138.hg19.vcf ## this will generate dbsnp_138.hg19.vcf.idx 

    ##### add read group and platform to bam files;

    java -jar ~/applications/picard-tools-2.23.0/picard.jar AddOrReplaceReadGroups I=21002.bam.PICARD.output.sorted.queryname.bam.MarkDuplicatesSpark.OUTPUT.bam RGID=21002  RGLB=21002lib1 RGPL=ILLUMINA RGPU=21002 RGSM=20 


    gatk BaseRecalibrator -R hg19.fa -I  --known-sites dbsnp_138.hg19.vcf  --known-sites ../Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.EDITED.gz -O


    gatk ApplyBQSR -R hg19.fa -I --bqsr-recal-file -O

6. HaplotypeCaller (single sample mode)

gatk HaplotypeCaller --bam-output --native-pair-hmm-threads 64 -R hg19.fa -I -O -D dbsnp_138.hg19.vcf

7. CNNScoreVariants

      gatk CNNScoreVariants -R hg19.fa -V -O

8. FilterVariantTranches

       gatk --java-options "-Xmx7g" FilterVariantTranches -V --resource ../Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.EDITED.gz --resource ../dbsnp_138.hg19.vcf --info-key CNN_1D --snp-tranche 99.95 --indel-tranche 99.4 -O




before adding your answer.

Traffic: 2373 users visited in the last hour

Source link