I’m not sure my Bulk RNAseq read counts extracted from fastq file are correct


Hi. I’m new in bioinformatics and I’m trying to extract read counts from fastq files.

I used STAR alignment method with GENCODE annotation files.

(I didn’t trimmed by reads because I heard that trimming is an option)

Then, I used featurecounts to get my read count matrix.

However, my counts are different from original read count provided by paper researchers.

They used Bowtie2 and TopHat to extract read count.

Now, i’m confused because there seems no standard extraction method for bulk RNAseq.

People use a lot of tools for trimming, alignment, and getting count matrix.

Which data should I trust? or How can I be sure my data is reliable?





Without knowing the exact context of all this I would assume there is nothing wrong with your data. Like in many (all?) other fields/analyses, if you use different tools or approaches you will get a different result. However, that does not mean one is more correct than another. Yes, the reads counts themself will differ but it all depends a bit on which “level” you compare the outputs. It is very well possible that in the end, eg. after doing differential gene expression analysis, you will get the same (or roughly the same) list of DEGs. If so then yes, the reads counts differ but the biological end goal remains somewhat consistent.

