How to handle VCFs from the same sample but using different aligners and variant callers?

Hi,
I’m using whole-exome sequencing (WES) for somatic variant calling.
During the process, I tried to follow the approach described here: pubmed.ncbi.nlm.nih.gov/28420412/

Basically my workflow is as follows:

  1. FASTQ preprocessing: Using 2 aligners (BWA-MEM, Bowtie2)
  2. BAM calibration
  3. Variant calling: Using 3 software (Mutect2, Strelka2, Lancet)
  4. Variant filtering: I keep just the variants marked as ‘PASS’

Questions

As you can see there are at least 6 VCFs per sample. I wonder how should I handle protocols like

  1. Merging VCFs by aligners and then intersecting by variant
    caller (either variant common in 2/3 or 3/3 software)
  2. Merging VCFs by aligners and then merging by variant caller
  3. Intersecting VCFs by aligners and then merging by variant caller

I have already used common tools to handle similar situations. For example, Strelka2 gave me two independent files with SNVs and Indels, so I had to use the concat tool from BCFTOOLS. Also, Lancet gave me SNVs and Indels in a single VCF but split by chromosomes, I used the MergeVCF from Picard to do this. Additionally, I used BCFTOOLS isec tool to check common variants detected by different variant callers.

Anyways, I’m afraid of getting the same variants being counted as different variants during the variant annotation. Like problems detailed in the figure 2 of this article pubmed.ncbi.nlm.nih.gov/30858580/

# Or is it better to annotate the 6 VCFs files from each of my samples and then filter somehow afterwards?

Additional: This is from NYGC Exome analysis pipeline v6

Next, the calls are merged by variant type (SNVs, Multi Nucleotide
Variants (MNVs) and Indels). MuTect2 and Lancet call MNVs, however
Strelka2 does not and it also does not provide any phasing
information. So to merge such variants across callers, we first split
the MNVs called by MuTect2 and Lancet to SNVs, and then merge the SNV
callsets across the different callers. If the caller support for each
SNV in an MNV is the same, we merge them back to MNVs. Otherwise those
are represented as individual SNVs in the final callset. Lancet is the
only tool that calls deletion-insertion (delins or COMPLEX) events.
Other tools may represent the same event as separate indel and/or SNV
variants. Such events are rare, especially in the exonic regions and
difficult to merge. We therefore do not merge COMPLEX calls with SNVs
and Indels calls from other callers.

# Is there some way to do this but include the aligner factor in my workflow?

Thanks for your time reading this
I really will appreciate any kind of help

Aldhair

Read more here: Source link