Output of samtools view, what does the third column actually represent?

The samtools view outputs information from SAM and BAM files in SAM format. You can find a description of the SAM format here: samtools.github.io/hts-specs/SAMv1.pdf

Section 1.4 deals with the meaning of each of the manditory coloumns. It includes the following table:

 Col  Field  Type    Regexp/Range                  Brief description
1    QNAME  String  [!-?A-~]{1,254}               Query template NAME
2    FLAG   Int     [0, 216 − 1]                  bitwise FLAG
3    RNAME  String  *|[:rname:∧*=][:rname:]*     Reference sequence NAME11
4    POS    Int     [0, 231 − 1]                  1-based leftmost mapping POSition
5    MAPQ   Int     [0, 28 − 1]                   MAPping Quality
6    CIGAR  String  *|([0-9]+[MIDNSHPX=])+       CIGAR string
7    RNEXT  String  *|=|[:rname:∧*=][:rname:]*   Reference name of the mate/next read
8    PNEXT  Int     [0, 231 − 1]                  Position of the mate/next read
9    TLEN   Int     [−231 + 1, 231 − 1]           observed Template LENgth
10   SEQ    String  *|[A-Za-z=.]+                segment SEQuence
11   QUAL   String  [!-~]+                        ASCII of Phred-scaled base QUALity+33

Column 12 contains a space separated list of optional informational tags about the read.

We can see that the first column is, as you have guessed, the name of the read (or query name). The second column is the FLAG – this is a bitwise flag that encode information about the status of the alignment. Things like is it a successful alignment, is its pair mapped, is it a read1 or a read2?

Finally the third column is the Reference sequence (i.e. the name of the contig the read is aligned to). This is the column you are interested in if you want to know which contig a read is aligned to. And you are correct that reads can align to more than one contig (depending on the configuration of the aligner).

The 7th column, which you note sometimes does and sometimes doesn’t contain contig information gives us information about the contig to which the mate of this read aligns. It only contains a contig name if the mate aligns to a different contig. If it aligns to the same contig, this columns will contain =. If the mate is unaligned, it will contain *.

Read more here: Source link