gatk GetPileupSummaries and CalculateContamination result in NaN on mouse data

Hello!

I ran gatk toolchain including CalculateContamination in galaxy on human exome sequencing data, and it worked fine. However when i try feeding it with murine data (and murine reference files), CalculateContamination gives me this contamination table:

sample  contamination   error
mouse1_tumor    NaN 1.0

And being fed with this result, FilterMutectCalls says:

java.lang.IllegalArgumentException: log10p: Log10-probability must be 0 or less
    at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)
    at org.broadinstitute.hellbender.utils.MathUtils.log10BinomialProbability(MathUtils.java:934)

I suspect there is a connection between these results.

I tried two vcf files – only for one mouse strain C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf.gz and one for all mice mgp.v5.merged.snps_all.dbSNP142.vcf.gz . I processed them to be biallelic like this:

bcftools plugin fill-tags C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf -Oz -o with_af.C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf.gz -- -t AF
bcftools view -Oz -m2 -M2 with_af.C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf > biallelic.with_af.C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf.gz
/home/transgeneprep/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk IndexFeatureFile -F biallelic.with_af.C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf.gz

With one strain file GetPileupSummaries gives no data at all:

#<METADATA>SAMPLE=mouse1_normal                 
contig  position    ref_count   alt_count   other_alt_count allele_frequency

And with panmurine vcf file it gives about 19,538,774 lines:

#SAMPLE=mouse1_tumor
#<METADATA>SAMPLE=mouse1_tumor                  
contig  position    ref_count   alt_count   other_alt_count allele_frequency
1   3001278 1   0   0   0.111111
1   3001309 1   0   0   0.0277778
1   3001310 1   0   0   0.0277778
..........
........
............

However regardless of the vcf file the CalculateContamination contamination table contains only NaN and 1.0 . But segmentation tables are different – with strain specific file it contains no data, and with panmurine file it looks similar to normal result (but not quite):

#SAMPLE=mouse1_tumor
#<METADATA>SAMPLE=mouse1_tumor
contig  start   end minor_allele_fraction
11  3133205 120186319   0.11578047329359388
12  9024112 117728420   0.11578047329359388
13  5864694 119389285   0.14709163137980344
14  8368568 122944938   0.11578047329359388
15  4153960 103503606   0.11578047329359388
16  5009839 96161963    0.11578047329359388
17  5928381 94835879    0.11578047329359388
18  3015925 89073490    0.11578047329359388
19  3996832 61225661    0.11578047329359388
1   4611421 193194496   0.12379274849627589
2   4925592 181917652   0.11578047329359388
3   8936403 154765979   0.11578047329359388
4   3212345 156357539   0.1547816247108552
5   3236897 149625240   0.11578047329359388
6   4754265 149353876   0.11578047329359388
7   3841411 144450962   0.11578047329359388
X   7946841 151850725   0.11578047329359388
8   3621134 127592996   0.11578047329359388
9   7131733 123664738   0.11578047329359388
10  5035173 128723359   0.11578047329359388

and takes about 2.5 hours to execute, which however does not help getting right contamination and FilterMutectCalls results.

Murine commands are like this:

'/home/transgeneprep/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'GetPileupSummaries' '-I' 'input.bam' '-V' '/media/transgeneprep/DataStore/References/mouse_GRCm38_ensembl_102/snp/biallelic.with_af.mgp.v5.merged.snps_all.dbSNP142.vcf.gz' '-L' '/media/transgeneprep/DataStore/References/mouse_GRCm38_ensembl_102/snp/biallelic.with_af.mgp.v5.merged.snps_all.dbSNP142.vcf.gz' -O '/home/transgeneprep/galaxy/database/objects/4/f/e/dataset_4fe4af55-a216-498d-bcc5-a26cbbd39d0d.dat' '--disable-sequence-dictionary-validation'
'/home/transgeneprep/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'CalculateContamination' '-I' '/home/transgeneprep/galaxy/database/objects/0/4/f/dataset_04f08621-9adb-49c5-bec0-ecb21d3f87a2.dat' '-matched' '/home/transgeneprep/galaxy/database/objects/4/f/e/dataset_4fe4af55-a216-498d-bcc5-a26cbbd39d0d.dat' '--tumor-segmentation' '/home/transgeneprep/galaxy/database/objects/a/1/f/dataset_a1f596be-bd1c-4ee3-8a81-06ef6577e267.dat' '-O' '/home/transgeneprep/galaxy/database/objects/e/2/b/dataset_e2bdecb6-2c1b-454a-b81b-3d3f2d9d5269.dat'
'/home/transgeneprep/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'FilterMutectCalls' '-R' '/media/transgeneprep/DataStore/References/mouse_GRCm38_ensembl_102//bwa/Mus_musculus.GRCm38.dna.primary_assembly.fa' '-V' 'input.vcf' '--contamination-table' '/home/transgeneprep/galaxy/database/objects/e/2/b/dataset_e2bdecb6-2c1b-454a-b81b-3d3f2d9d5269.dat' '--tumor-segmentation' '/home/transgeneprep/galaxy/database/objects/a/1/f/dataset_a1f596be-bd1c-4ee3-8a81-06ef6577e267.dat' '--orientation-bias-artifact-priors' '/home/transgeneprep/galaxy/database/objects/0/7/f/dataset_07f1c9fd-bd59-4c69-8ff5-75aad7abf42a.dat' '-O' '/home/transgeneprep/galaxy/database/objects/3/0/5/dataset_305ecc93-51f2-4168-9488-303d05dd9731.dat' 

Hominine commands are like this:

'/home/transgen/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'GetPileupSummaries' '-I' 'input.bam' '-V' '/home/transgen/galaxy/tools/melanoma_tools/genome/ALL.wgs.biallelic.exons.af.vcf.gz' '-L' '/home/transgen/galaxy/tools/melanoma_tools/genome/ALL.wgs.biallelic.exons.af.vcf.gz' -O '/home/transgen/galaxy/database/objects/c/9/c/dataset_c9c268ae-784b-469a-92f4-96bf2c0d8a5c.dat' '--disable-sequence-dictionary-validation'
'/home/transgen/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'CalculateContamination' '-I' '/home/transgen/galaxy/database/objects/c/1/d/dataset_c1d59b22-ab10-450c-adbf-2e26dcf59c29.dat' '-matched' '/home/transgen/galaxy/database/objects/c/9/c/dataset_c9c268ae-784b-469a-92f4-96bf2c0d8a5c.dat' '--tumor-segmentation' '/home/transgen/galaxy/database/objects/3/e/d/dataset_3edec44a-5833-45ca-a269-d00daf80a7b3.dat' '-O' '/home/transgen/galaxy/database/objects/4/1/0/dataset_41054645-14bf-460b-91a1-4a32196e1454.dat'
'/home/transgen/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'FilterMutectCalls' '-R' '/home/transgen/galaxy/tools/melanoma_tools/genome/hg38.analysisSet.fa' '-V' 'input.vcf' '--contamination-table' '/home/transgen/galaxy/database/objects/4/1/0/dataset_41054645-14bf-460b-91a1-4a32196e1454.dat' '--tumor-segmentation' '/home/transgen/galaxy/database/objects/3/e/d/dataset_3edec44a-5833-45ca-a269-d00daf80a7b3.dat' '--orientation-bias-artifact-priors' '/home/transgen/galaxy/database/objects/e/f/7/dataset_ef7eba53-add9-49e3-b95c-86a57e08b028.dat' '-O' '/home/transgen/galaxy/database/objects/2/7/c/dataset_27ce06ec-f75b-4d70-8f22-259e27ed881d.dat'

What am i doing wrong?

Thanks in advance.

Read more here: Source link