How can I find reads for specific elements in a bam file?


I have a specific set of 1,009 elements in a bed file that I am interested in. I also have bam files which I would like to process to know the number of reads for these specific elements (for comparison purposes). I understand some simple uses of samtools commands, like:

samtools view -c file.bam

But I’m looking for a way to search for specific elements in a bed file. I have:

import argparse
import pysam
import pandas as pd

bamfile = "/mnt/d/Axiotl/ABC/Input/bam/ENCFF384ZZM.bam"
bed_file = "/mnt/d/R/abc_3a_scaffold_hg19.bed"
output = "/mnt/d/Axiotl/ABC/Output/bam" 
genome_sizes = "/mnt/d/Axiotl/ABC/Working_Copy/ABC-Enhancer-Gene-Prediction-master/reference/chr_sizes"

def count_bam(bamfile, bed_file, output, genome_sizes, use_fast_count=True, verbose=True):
    reads = pysam.AlignmentFile(bamfile)
    read_chrs = set(reads.references)
    bed_regions = pd.read_table(bed_file, header=None)
    bed_regions = bed_regions[bed_regions.columns[0:3]]
    bed_regions.columns = "chr", "start", "end"
    counts = [(reads.count(row.chr, row.start, row.end) if (row.chr in read_chrs) else 0) for _, row in bed_regions.iterrows()]
    bed_regions['count'] = counts
    bed_regions.to_csv(output, header=None, index=None, sep="t")

if  __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--bamfile', help="input bam file")
    parser.add_argument('--bed_file',  help="input bed file")
    parser.add_argument('--genome_sizes', help="genome sizes")
    parser.add_argument('--output', help="output directory")
    args = parser.parse_args()
        count_bam(bamfile, bed_file, output, genome_sizes)
    except KeyboardInterrupt:

But I keep getting the error:

ValueError: Length mismatch: Expected axis has 1 elements, new values have 3 elements

I’ve tried absolutely everything I can think of. Any help would be greatly appreciated!

Read more here: Source link