Check if BAM is derived from pair-end or single-end reads?

Check if BAM is derived from pair-end or single-end reads?

3

I’m automating a pipeline and currently a user inputs as a parameter if the input BAM is from pair-end or single-end reads. I want to automatically check this. How can do I this?


BAM


Pair-End


Single-End

• 16k views

Use the sam flag and samtools view -c -f 1 in.bam to test if your bam contains paired reads. ( 1= “read paired” See here )

You may also use samtools view -H. It will print you the header of the bam, including the command that was used for the alignment. Usually, paired-end data are separated into two fastq files, one for forward/reverse reads. Hence, if the alignment command shows two fastq files to be used, it was a paired-end alignment.

An example where BWA mem was used:

samtools view -H my.bam

@PG ID:bwa  PN:bwa  VN:0.7.12-r1044 CL:/opt/bwa-0.7.12/bwa mem -M -t 24 /Genomes/hg19/hg19.fasta in_1.fastq in_2.fastq
-----------------------------------------------------------------------------------------------------^---------^

where the general BWA mem paired-end command is (similar in Bowtie etc.):

./bwa mem -M -t *threads* indexedRefGenome.fasta in_1.fastq in_2.fastq > out.sam
-----------------------------------------------------^---------^

updated 3.1 years ago by

34k

written 5.4 years ago by

55k

Both methods suggested by Pierre and Alexander will work great for 99.99% of cases, but things get a little messy if you want to detect single-end-reads in a mixed paired & single BAM file, or singletons in a previously paired-end BAM file.

Detecting singles in a mixed file is easy, just look for samtools view -c -F 1 in.bam

Detecting singletons is a bit trickier though… since the filtering step might have just removed the read and left the mate totally unaffected.

Perhaps you could use the read names (although people have a habit of putting junk like /1 and /2 on the end of the read names)

Or maybe one of Pierre’s tools can detect orphaned singletons 🙂

Alexander’s method would solve this very quickly and easily if all BAM files coming in are totally under your pipeline’s control (you mapped/filtered them yourself at some point). But external “mystery BAMs” may have issues.

updated 3.1 years ago by

34k

written 5.4 years ago by

13k


Login
before adding your answer.

Traffic: 2727 users visited in the last hour

Read more here: Source link