Genes’ fpkm values through cufflink

Hi,

I am a newbie to RNA-seq data analysis. I have to identify differentially expressed genes (DEGs) between human and chimpanzee in a tissue type. I have comparable RNA-seq experiment data (reads/fastq) for the two species. Each species has 2 biological replicates(each with three technical replicates) so six runs per species.

I understand that identification of DEGs by cufflink package (cuffdiff) is for two conditions with same reference genome. To identify DEGs between different species, I have to use edgeR or DEseq.

I intend to identify FPKM values for all genes in case of all 12 runs (6 runs per species) and then to use this FPKM dataset to identify DEGs with R package (EdgeR or Deseq). Is this approach okay?

Second, my main question is about fpkm values I am getting in cufflink output. For running cufflink, I am following the step-by-step protocol mentioned in the cufflink protocol paper (www.nature.com/articles/nprot.2012.016).

First I ran tophat with following command:

tophat -p 8 -G hg38.ncbiRefSeq.gtf -o Human_B1_T1 hg38 SRRxxx_1.fastq SRRxxx_2.fastq

Then I ran cufflink as below:

cufflinks -p 8 -o Clout_Human_B1_T1 Human_B1_T1/accepted_hits.bam

The ‘genes.fpkm_tracking’ file I got in cufflink output has first few lines as below:

tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status

CUFF.1 – – CUFF.1 – – chr1:151793-152723 – – 1.57259 0.924969 2.22021 OK

CUFF.2 – – CUFF.2 – – chr1:153030-158982 – – 0.924186 0 6.23538e+06 OK

CUFF.3 – – CUFF.3 – – chr1:633736-634228 – – 12.1477 9.07784 15.2175 OK

If someone please tell what CUFF.1 CUFF.2 (and so on) means. Other than 1st (tracking id) column, the same thing is present in the 4th (gene_id) column as well. How can I get FPKM values along with gene names? There are no gene names in this file.

I found this (biostar.usegalaxy.org/p/17760/) as a relevant post but couldn’t find clear answer there.

TIA

PS: For the hg38 genes.gtf file, I used the file ‘hg38.ncbiRefSeq.gtf’ downloaded from UCSC portal.

Read more here: Source link