How To Filter Mapped Reads With Samtools

Hi, You get a bam (machine readable sam) file after mapping, and it contains information about mapped and unmapped reads.

To get the unmapped reads from a bam file use:

samtools view -f 4 file.bam > unmapped.sam

the output will be in sam

to get the output in bam, use:

samtools view -b -f 4 file.bam > unmapped.bam

To get only the mapped reads use the parameter F, which works like -v of grep and skips the alignments for a specific flag.

samtools view -b -F 4 file.bam > mapped.bam

From the manual; there are different int codes you can use with the parameter f, based on what you want:

-f INT Only output alignments with all bits in INT present in the FLAG field. INT can be in hex in the format of /^0x[0-9A-F]+/ [0]

Each bit in the FLAG field is defined as:

Flag        Chr     Description
0x0001      p       the read is paired in sequencing
0x0002      P       the read is mapped in a proper pair
0x0004      u       the query sequence itself is unmapped
0x0008      U       the mate is unmapped
0x0010      r       strand of the query (1 for reverse)
0x0020      R       strand of the mate
0x0040      1       the read is the first read in a pair
0x0080      2       the read is the second read in a pair
0x0100      s       the alignment is not primary
0x0200      f       the read fails platform/vendor quality checks
0x0400      d       the read is either a PCR or an optical duplicate
  

Like for getting the unique reads (a single read mapping at one best position); I use:

-q INT Skip alignments with MAPQ smaller than INT [0]

samtools view -bq 1 file.bam > unique.bam

HTH

Read more here: Source link