How can I be sure that raw read counts are well processed from fastq files?
Hi. I’m new in bioinformatics and try to process fastq files for getting raw read count matrix.
I downloaded fastq files from www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452
-
I used fasterq -dump to download fastq files from SRR
-
Aligned fastq files with ENSEMBL annotation files which are
Homo_sapiens.GRCh38.104.chr.gtf & Homo_sapiens.GRCh38.dna_rm.primary_assembly.fa
without any trimming -
Extracted raw count matrix using featurecounts with BAM files
To check if my results are well processed, I normalized my read count matrix (CPM)
since I could get normalized data matrix from www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452.
I compared my data with normalized count data from www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452,
but the results are quite different than I thought.
I thought that the results would be a little different since I used other tools to get my result, but
when you see some results
left one is my data and right one if from www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452 normalized data.
When you look at the A1BG gene, for example, there is huge difference between two data.
What can I do to fix this problem? It seems not reasonable to use same tools everytime I try to extract raw count from fastq.
• 23 views
Traffic: 2758 users visited in the last hour
Read more here: Source link