sam2tsv listing incorrect reference sequence & positions

Duplicate of: github.com/lindenb/jvarkit/issues/190

Hi can anyone help me resolve the issue I’m having with sam2tsv.
It is a nifty piece of software but I have been encountering issues with it regarding the numbering of nucleotides it shows for the reference sequence.

Here’s what sam2tsv tells me:

output of sam2tsv

The nucleotide string marked in red CTGGCCGAGCTAG is the read reporting the mutation T>A (line #469).
But the reference sequence listed by sam2tsv (green box) doesn’t match the read sequence at all. In fact it is correct sequence from the reference fasta, but it is right-shifted by 16 bases. With other reference files, this number is different, e.g. 20.

In fact, if I search the sequence ATGGAGACCCGCT in my reference sequence it spans 356-368. In contrast sam2tsv lists these residues in the range 339-352 (as seen in the green box). Off by exactly

searching reference

To summarize:
1) The position listed in the green box (reference), corresponds to the sequence in the red box (read).
2) The reference sequence listed in the green box, is off by 16 bases.

Information:
PacBio HiFi CCS reads aligned using pbmm2 to custom reference (cDNA).

Files:

Reference.fa.

relevant sam2tsv output

sam containing the read in question

Command used:

java -jar '~/Git/Others_Cloned/jvarkit/dist/sam2tsv.jar' -R ../../../reference/pBG-ERBB2/ERBB2.fa 1.ERBB2_Library.bam

Read more here: Source link