How to select uniquely and concordantly reads from hisat2 alingment for raw read count

Ok, let’s go through it. The -f 0x2 bitwise flag selection keeps proper pairs, which are read pairs mapped in correct orientation (I suppose illumina reads facing each other) and within insert size (-X and -I) -> concordantly.

Now, if you want the uniquely mapped, you should basically exclude all those reads that have a secondary alignment. From the HISAT2 manual:

ZS:i:    <N>     Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than [AS:i].

Basically, if the line contains the ZS: field, it has a secondary alignment, i.e. it is not unique. Filter those out, for example using awk:

samtools view -h -f 0x2 file.bam | awk 'substr($1, 0, 1)=="@" || $0 !~ /ZS:/' | samtools view -h -b > filtered_file.bam

The awk part selects only lines that either start with @ (like the header) or don’t contain the ZS: tag, like the reads with secondary alignments (i.e. non-uniquely mapping).

Read more here: Source link