how to use ESTIMATE to infer tumor purity and stromal score from RNA-seq data?
Did anyone use ESTIMATE (bioinformatics.mdanderson.org/main/ESTIMATE:Overview) to infer tumor purity and stromal score from RNA-seq before? I am not clear how to use this tool and what is the input file format for this tool? They are just several steps, and i did not figure out how to load my own data to run the program? Thanks very much for your great help.
The ESTIMATE algorithm (Yoshihara et al. 2013 Nature Communications) is comprised of two steps. In the first step, an enrichment score is calculated using single-sample GSEA (Barbie et al. 2009 Nature). Note that although immune cells are essentially part of the stroma, Yoshihara et al. calculated two enrichment scores. One based on immune-related genes, which they referred to as “immune” score. The other score was calculated based on non-immune genes, which they referred to as “stromal” score. The final ESTIMATE score is the sum of immune and stromal enrichment scores. In the second step, the ESTIMATE enrichment score is converted to tumor purity using the following formula:
Tumour purity = cos (0.6049872018 + 0.0001467884 x ESTIMATE score)
where “Tumor purity” represents ABSOLUTE-based tumor purity (ABSOLUTE is another algorithm that computes tumor purity based on somatic DNA copy number alterations), and “ESTIMATE score” represents ESTIMATE enrichment score obtained from TCGA Affymetrix data, as explained above. The key point is that this calibration formula was derived using only Affymetrix data, and therefore cannot be used to convert RNAseq-based ESTIMATE score to tumor purity. That being said, you may still apply the single-sample GSEA algorithm to properly normalized RNAseq data to obtain ESTIMATE enrichment scores, and incorporate them as covariate in your downstream analysis to account for tumor purity.