Oxford Nanopore Variant Calling Pipeline Calls Very Few of Variants of NA12878

Hello everyone,
First, I am so sorry for this long and very amateur question. I am trying to build a pipeline for SNP calling for Oxford Nanopore MinION based long reads. I need to test the pipeline but apparently the number of test data is really low. I only have Na12878 data from this address:
github.com/nanopore-wgs-consortium/NA12878/blob/master/Genome.md

I downloaded the FAST5 data coded as “FAB43577” (it is said that data has 427,215 reads and 2,776,702,333 bases).
I used Guppy V5.0.1 as basecaller with the command:

guppy_basecaller -i /home/huk/Desktop/nanopore_data/na12878_fast5/data2/UCSC/FAB43577-3574887596_Multi -s /home/huk/Desktop/nanopore_data/na12878_fast5/data2/guppy_out -c dna_r9.4.1_450bps_fast.cfg --trim_barcodes --trim_strategy dna --num_callers 1 --cpu_threads_per_caller 12

Then I merged all FASTQ files inside the “pass” folder of Guppy results with “cat” command and obtained single FASTQ.

minimap2 -ax map-ont -t 12 /home/huk/Desktop/references/hg38/hg38.mmi /home/huk/Desktop/nanopore_data/na12878_fast5/data2/guppy_out/pass/all_data.fastq --MD > /home/huk/Desktop/nanopore_data/na12878_fast5/data2/minimap_output/mapped_12878_2.md.sam

I transformed the SAM file to BAM file with samtools. I indexed and sorted the file as well. Then I used longshot for variant calling only on chr20 via the command:

longshot --bam /home/huk/Desktop/nanopore_data/na12878_fast5/data2/minimap_output/mapped_12878_2_sorted_md.bam --ref /home/huk/Desktop/references/hg38/hg38.fa -F -r chr20 --out /home/huk/Desktop/nanopore_data/na12878_fast5/data2/vcf_output/longshot_result.vcf

My final VCF have 827 (without filtering) variants. I downloaded the high confidence VCF file of NA12878 from this link ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/NISTv3.3.2/GRCh38/supplementaryFiles/

In this VCF, chr20 have about 67957 SNPs. I compared the variants and only 8 of them are common in both VCFs.

I also used nanopolish index and nanopolish variants for variant calling but the final VCF is completely empty (only headers and comments of standard VCF).

I am not sure why I have very low number of variants. If anyone can give me a hint or tell me what I am doing wrong I would be really grateful. I am completely stuck here. If you know another test data (if there is) for variant calling of Oxford Nanopore MinION, it would be awesome too.

Thank you in advance.

Read more here: Source link