How can I be sure that raw read counts are well processed from fastq files?

How can I be sure that raw read counts are well processed from fastq files?

0

Hi. I’m new in bioinformatics and try to process fastq files for getting raw read count matrix.

I downloaded fastq files from www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452

  1. I used fasterq -dump to download fastq files from SRR

  2. Aligned fastq files with ENSEMBL annotation files which are
    Homo_sapiens.GRCh38.104.chr.gtf & Homo_sapiens.GRCh38.dna_rm.primary_assembly.fa
    without any trimming

  3. Extracted raw count matrix using featurecounts with BAM files

To check if my results are well processed, I normalized my read count matrix (CPM)

since I could get normalized data matrix from www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452.

I compared my data with normalized count data from www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452,

but the results are quite different than I thought.

I thought that the results would be a little different since I used other tools to get my result, but

when you see some results

enter image description here

left one is my data and right one if from www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452 normalized data.
When you look at the A1BG gene, for example, there is huge difference between two data.
What can I do to fix this problem? It seems not reasonable to use same tools everytime I try to extract raw count from fastq.


fastq


RNAseq


raw-count

• 23 views


Login
before adding your answer.

Traffic: 2758 users visited in the last hour

Read more here: Source link