VEP/ CADD error – ERROR: Assembly is GRCh38 but CADD file does not contain GRCh38 in header.

Dear Biostars,

I am having a confusing issue with my CADD plugin. This is confusing because when I run VEP for my whole trio – all the plugins work fine. However when I try to run CADD for individual – pivoted files – it no longer does and I get the following error – ERROR: Assembly is GRCh38 but CADD file does not contain GRCh38 in header.

The following code works:

modules/ variables load

module load Perl/5.34.0-GCCcore-11.2.0
module load tabix/0.2.6-GCCcore-10.2.0
module load Bio-DB-HTS/3.01-GCC-11.2.0
module load DBD-mysql/4.050-GCC-11.2.0
module load OpenSSL/1.1.1d-GCCcore-8.3.0

dir=/path/to/vep/
dir_cache=/path/to/vep/
fasta="Homo_sapiens_assembly38.fasta"

export PERL5LIB=$PERL5LIB:/mnt/storage/nobackup/proj/rtmngs/Pipelines/Software/vep/ensembl-vep/Plugins
${dir}/vep --cache --dir $dir \
--dir_cache $dir_cache \
--offline \
--fasta $fasta \
--species homo_sapiens \
--input_file trio_cohort.vcf.gz   \
--output_file trio_VEP_annotated.vcf  \
--format vcf \
--force_overwrite  \
--vcf \
--no_check_variants_order \
--check_existing \
--freq_pop gnomAD \
--assembly GRCh38 \
--stats_file trio_vep_stat.html \
--warning_file trio_vep_warning.txt \
--hgvs \
--variant_class \
--keep_csq \
--af_gnomad \
--polyphen p \
--sift p \
--symbol \
--total_length \
--max_af \
--plugin LoFtool \
--plugin REVEL,/pluginpath/new_tabbed_revel_grch38.tsv.gz \
--plugin Mastermind, /pluginpath/mastermind/mastermind_cited_variants_reference-2022.07.22-grch38.vcf.gz,0,0,1 \
--plugin DisGeNET, file=/pluginpath/disgenet/all_variant_disease_pmid_associations_final.tsv.gz,disease=1 \
--plugin LoFtool \
--plugin CADD, /pluginpath/whole_genome_SNVs.tsv.gz,/pluginpath/gnomad.genomes.r3.0.indel.tsv.gz \
--fields "Uploaded_variation,Location,Allele,Gene,Feature,SYMBOL,Existing_variation,VARIANT_CLASS,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,HGVSc,HGVSp,BIOTYPE,IMPACT,CLIN_SIG,PolyPhen,SIFT,MAX_AF,gnomAD_AF,AF,CADD_PHRED,CADD_RAW,LoFtool,REVEL,Mastermind_URL,DisGeNET_PMID,DisGeNET_SCORE,DisGeNET_disease" \
--pick \
--pick_order rank,canonical,tsl \
--buffer_size 20000 \
--fork 4

The following code does not work when I try to use VEP on the proband, mother and father .vcf files individually:

for sample in $(ls -1 *_unique.vcf) 
do
    ${dir}/vep --cache --dir $dir \
    --dir_cache $dir_cache \
    --offline \
    --no_stats \
    --fasta $fasta \
    --species homo_sapiens \
    --input_file ${sample} \
    --output_file ${sample}_coding.vcf  \
    --format vcf \
    --vcf \
    --no_check_variants_order \
    --hgvs \
    --variant_class \
    --keep_csq \
    --af_gnomad \
    --polyphen p \
    --sift p \
    --symbol \
    --total_length \
    --max_af \
    --check_existing \
    --freq_pop gnomAD \
    --assembly GRCh38 \
    --plugin LoFtool \
    --plugin REVEL,/pluginpath/new_tabbed_revel_grch38.tsv.gz \
    --plugin Mastermind,/pluginpath/mastermind_cited_variants_reference-2022.07.22-grch38.vcf.gz,0,0,1 \
    --plugin LoFtool \
    --plugin DisGeNET, file=//pluginpath//all_variant_disease_pmid_associations_final.tsv.gz,disease=1 \
    --plugin CADD, /pluginpath//whole_genome_SNVs.tsv.gz,/pluginpath//gnomad.genomes.r3.0.indel.tsv.gz \
    --fields "Uploaded_variation,Location,Allele,Gene,Feature,SYMBOL,Existing_variation,VARIANT_CLASS,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,HGVSc,HGVSp,BIOTYPE,IMPACT,CLIN_SIG,PolyPhen,SIFT,MAX_AF,gnomAD_AF,AF,CADD_PHRED,CADD_RAW,LoFtool,REVEL,Mastermind_URL,DisGeNET_PMID,DisGeNET_SCORE,DisGeNET_disease" \    
    --pick_order rank,canonical,tsl \
    --buffer_size 20000 \
    --fork 4 
done

Any ideas on why CADD works fine for the first VEP when using a trio, but not on the individual files?

Cheers,
Krutik

Read more here: Source link