different result using minimap2 and pbmm2

Hi all!

I am analysing CSS Pacbio data and each sample came from different run, in particular I have three files for each sample.
I tested both pbmm2 and minimap2 to align my long reads, after getting the consensus sequences.

This is the command I used to run mnimap2:

minimap2 -a -x map-pb -t 8 genome.fa $file1 $file2 $file3 > align.bam
samtools sort -@ 16 -m 3G align.bam > align.sort.bam

The size of align.bam is ~538 G, the size of align.sort.bam is ~100 G. I checked the number of reads in align.bam and align.sort.bam using samtools flagstat, and I got the same output:

29865175 + 0 in total (QC-passed reads + QC-failed reads)
5101289 + 0 secondary
6865323 + 0 supplementary
0 + 0 duplicates
26824957 + 0 mapped (89.82% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

The difference between total and secondary+supplementary is equal to the total number of reads:
29865175 - 5101289 - 6865323=17898563

After that, I ran pbmm2 using the following command:

echo $file1 > 'myfile.fofn'
echo $ile2 >> 'myfile.fofn'
echo $file3 >> 'myfile.fofn'

pbmm2 align genome.fa myfile.fofn  alignPbmm2.bam --preset CCS -j 16
samtools sort  -@ 16 -m 3G alignPbmm2.bam> alignPbmm2.sort.bam

The size of alignPbmm2.bam is ~460 G, the size of alignPbmm2.sort.bam is ~390 G. I checked the number of reads in those two files and I got the following file from samtools flagstat:

42157979 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
27576659 + 0 supplementary
0 + 0 duplicates
42157979 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Why do I get such different results? How can I understand which is the most reliable?

Thank you in advance

Read more here: Source link