rna seq – Why is there antisense sequence in RNAseq data

I’m looking at RNAseq data from CCLE. The data is paired-end.

Take the cell line Hs578T and the gene HRAS as an example.

The cell line carries a G12D mutation (c.35G>A), so the change in cds is:

ggc ggtgtgggca agagtgcgct g - Wildtype CDS
gAc ggtgtgggca agagtgcgct g - Mutant CDS
 ^

My question is, when I grep the mutant CDS gAcggtgtgggcaagagtgcgctg, I do not get a match in my .bam file. But when I grep the reverse complement (i.e., anti-sense sequence), I get matches coming from both mates.

My question is, why are there reads in the .bam file that correspond to the anti-sense strand of the DNA? Shouldn’t all the reads of mRNA be the CDS?


I got the answer –
The punch line is that the sequence in .bam files is ALWAYS ‘+’ strand of the reference, no matter what mate the read came from, or which strand the gene sits on.

Read more here: Source link