How should I deal with my RNAseq result with high % of mutiple aligned reads?

How should I deal with my RNAseq result with high % of mutiple aligned reads?

1

enter image description here

Here is one of my result from STAR alignment.
As you could see, Uniquely mapped reads: 32.37% and % of reads mapped to multiple loci: 60.74%
This sounds not that bad because more than 90% of my reads were aligned well.
However, when I process bam file with featureCounts, Successfully assigned alignments rate is 10.1%

Here are more details about commands and program that I used.

  1. fastq downloaded from SRA: SRP050036 using fasterq-dump (only used SRR1657054 for test)
  2. ENSEMBL GRCh 38 reference file created with STAR

    FASTA: ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

    GTF: ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh38.104.gtf.gz

    STAR –runThreadN 64 –runMode genomeGenerate –genomeDir [output folder name]
    –genomeFastaFiles [Homo_sapiens.GRCh38.dna.toplevel.fa path]
    –sjdbGTFfile [Homo_sapiens.GRCh38.104.gtf path]

  3. STAR aligned (version: 2.7.9a)

    STAR –runThreadN 32 –genomeDir [ENSEMBL reference file path]
    –readFilesIn [SRR1657054.fastq path]
    –outSAMtype BAM SortedByCoordinate –quantMode GeneCounts
    –outFileNamePrefix [output file name]

  4. Extract count matrix using featureCounts

    featureCounts -T 10 -s 0 -a [Homo_sapiens.GRCh38.104.gtf path]
    -o [output file name] [SRR1657054Aligned.sortedByCoord.out.bam path]

    (I tried -s 1 and -s 2 also, but didn’t get better result)

Q1) % of Uniquely mapped reads is 32.37%, and I thought % of successfully assigned alignments will be close to 32.37%,
but only 10.1%. Why is this happening? (I used same annotation GTF file)

Q2) How should I deal with high % of reads mapped to multiple loci?
According to featureCounts manual,

Multi-mapping reads will also be counted. For a multi-mapping read,
all its reported alignments will be counted. The ‘NH’ tag in BAM/SAM
input is used to detect multi-mapping reads.

So if I use -M option then one read may counted more than once? What should I do about it?


RNAseq


raw-count


fastq

• 38 views

% of Uniquely mapped reads is 32.37%, and I thought % of successfully
assigned alignments will be close to 32.37%, but only 10.1%. Why is
this happening?

Alignment of reads mapping to multiple loci create multiple bam entries (for the same read). In consequence, the proportion of uniquely mapped reads can not be compared with the proportion of successfully assigned alignment as it is not the same denominator. In your case, 60% of multi-mapping reads will account for a minimum of 80% of your alignments (more than that if they align more than twice). In consequence, it is possible that the majority of your 30% uniquely mapped reads are found in your 10% of successfully assigned alignments.


Login
before adding your answer.

Traffic: 2421 users visited in the last hour

Read more here: Source link