Single-cell and bulk transcriptome sequencing identifies two epithelial tumor cell states and refines the consensus molecular classification of colorectal cancer

Cell-type annotation

scRNA-seq data were filtered to discard low-quality cells and doublets (Supplementary Fig. 1, Extended Data Fig. 1 and Methods). Supervised clustering (Reference Component Analysis v2 (RCA2)) at low resolution grouped cells into 11 major cell types (Extended Data Fig. 1). To identify epithelial cell subtypes, we initially analyzed the largest cohort, ‘CRC-SG1’ (Singapore, 208,367 cells, 15,920 epithelial), with multiple tumor sectors per patient, serving as biological replicates (Fig. 1 and Extended Data Fig. 1). We performed de novo clustering on CRC-SG1 epithelial cells using DUBStepR10 for feature selection, and then re-clustered the cells using differentially expressed genes (DEGs) between the initial clusters (Supplementary Fig. 1 and Methods). One cluster comprised all cells from normal samples and a minority of cells from tumor samples (23.4%). These represent normal cells in tumor sectors. Malignant cells from tumor samples formed patient-specific clusters, each of which included cells from different sectors of the same tumor (biological replicates) (Fig. 2a and Extended Data Fig. 2a). All other major cell types showed minimal patient specificity (Extended Data Fig. 2b). Thus, patient-specific transcriptomic clusters formed by tumor epithelial cells likely represent biological differences between patients, rather than batch effects.

Fig. 2: The discovery of iCMS subtype in scRNA-seq.
figure 2

a, Reduced-dimensionality (UMAP) visualization of epithelial single cells (n = 15,920) in transcriptome space: 14 patients from CRC-SG1, colored by patient ID. b, Same dataset, PCA visualization of 14 patient-specific epithelial pseudo-bulk transcriptomes. c, UMAP visualization of 49,155 epithelial cells from five cohorts in transcriptomic space colored by iCMS subtype. d, Heatmap of 63 patient-specific pseudo-bulk-inferred CNV scores. Columns were sorted based on their chromosomal position while rows were clustered using hierarchical clustering. e, UMAP visualization of 49,155 epithelial cells from five cohorts in CNV space with each cell represented by its vector of inferred copy number scores in genomic bins. f, Heatmap of expression of 715 DEGs in patient-specific pseudo-bulk transcriptomes. Only the 61 patients with consistent iCMS classification were used. Each gene is zero-centered and scaled to unit variance. g, UMAP visualization of 46,006 epithelial cells from 61 patients in regulon space (SCENIC analysis). h, Patient-specific pseudo-bulk heatmap of 90 differentially expressed regulons: 61 patients, colored by scaled regulon activity score (AUC score). i, UMAP plot of epithelial cells in transcriptomic, copy number and regulon space, colored by MSI status. j, Dendrogram showing the distance between epithelial subtypes in transcriptomic space. The number of DEGs was defined as the pairwise distance and the matrix of pairwise distances was used for tree construction. AUC, Area Under Curve; UMAP, Uniform Manifold Approximation and Projection.

Principal component analysis (PCA) demarcated two distinct epithelial subgroups (Fig. 2b). We used 848 DEGs (Supplementary Fig. 2a) between these two groups to cluster 33,235 epithelial single-cell transcriptomes from the four other cohorts (111 samples, 49 patients) (Supplementary Fig. 2b and Methods). In all four cohorts, we again observed coclustering of normal cells and numerous patient-specific malignant cell clusters that formed two distinct groups in PCA (Supplementary Fig. 2b,c). We then combined the five cohorts (189 samples, 63 patients) and used the 848 DEGs to cluster epithelial cells, and again recovered two major tumor subtypes (iCMS2 and iCMS3; Fig. 2c).

We noticed that genes upregulated in iCMS2 epithelial cells relative to iCMS3 and normal-like epithelium in CRC-SG1 were enriched in specific chromosomal arm gains, including 8q, 13q and 20q (Supplementary Fig. 2d). We therefore used inferCNV11 to infer copy number variants (CNVs) from epithelial cell transcriptomes of all five cohorts and observed that, despite substantial patient-to-patient variability within iCMS2, 7pq, 8q, 13q and 20pq were frequently gained and 1p, 4pq, 8p, 14q, 15q, 17p and 18pq were frequently deleted (Fig. 2d). In contrast, iCMS3 tumors were diploid or showed infrequent and inconsistent copy number alterations. Moreover, patients were clustered by their pseudo-bulk copy number profiles, and we recapitulated the iCMS2–iCMS3 dichotomy observed in epithelial single-cell transcriptomes, with only 2 of 63 patients (3%) showing discordant grouping (Fig. 2d). The separation of i2 and i3 tumors was observed even when epithelial single cells from the five cohorts were visualized in inferred copy number space (Fig. 2e), suggesting copy number alterations contribute to the observed dichotomy in CRC epithelial transcriptomes. We performed differential expression analysis on epithelial pseudo-bulk transcriptomes from the 61 consistent patients to define an intrinsic epithelial cancer signature comprising 715 genes (Fig. 2f, Supplementary Fig. 2e and Methods).

To evaluate the relationship of the two single-cell-defined intrinsic epithelial groups to CMS groups identified by bulk gene expression, we first identified marker genes for each CMS subtype that also showed epithelial-specific expression in our five scRNA-seq datasets. We then used these four gene sets to construct four epithelial CMS metagene expression scores for each malignant cell. Upregulation of the CMS2 epithelial metagene was the most prominent feature of iCMS2 cells. Similarly, upregulation of the CMS3 epithelial metagene distinguished iCMS3 cells (Supplementary Fig. 3). Thus, our single-cell iCMS classification may represent the core epithelial intrinsic components of bulk CMS.

Next, we used SCENIC12 to infer single-cell activity scores for the regulons of 347 transcription factors and used these scores to cluster epithelial cells from the 61 patients. The same two intrinsic epithelial subtypes again self-emerged, with 90 differential regulons including TCF/LEF (T cell factor/lymphoid enhancer factor family), MYC and homeobox transcription factors, suggesting a pervasive gene regulatory program underpinning the biology of these two epithelial subtypes (Fig. 2g,h and Supplementary Fig. 4).

To characterize the distribution of malignant cells as distinct transcriptomic states (versus a continuum), we calculated their i2 and i3 metagene expression scores by averaging the 715 iCMS marker genes (Fig. 2f) within the same iCMS group (Supplementary Notes). The metagene score distribution was bimodal, with one mode corresponding to i2-like transcriptomes and the other i3-like (Extended Data Fig. 3a,b and Supplementary Notes), supporting distinct i2 and i3 epithelial cell states. By cluster label, >95% of malignant cells belonged to a single cluster (either i2 or i3) (Extended Data Fig. 3c). By i2 and i3 metagene score, >80% of cells were either preferentially i2 or preferentially i3 (score difference > 0.1) in 54 of 63 tumors (86%) (Supplementary Fig. 5). Thus, in most tumors, the large majority of cells belong to a single iCMS type, with hybrid tumors being infrequent.

Transcriptomic distances

Across transcriptomic, CNV and regulon spaces (Fig. 2i), a group of MSS epithelial cells comingled with MSI-H cells within the i3 cluster, suggesting biological programs in i3_MSS are more similar to MSI-H than to i2_MSS. Using the number of pseudo-bulk DEGs as a distance measure, we constructed a dendrogram to represent the relationships between these malignant cell groups. Consistently, i3_MSS has much greater similarity to i3-MSI-H than to i2_MSS (Fig. 2j).

Epithelial subtypes in an independent scRNA-seq dataset

We re-analyzed single-cell transcriptomes from a recent study of 62 patients with CRC13, which had focused on cell-type differences between MSI-H and MSS tumors. Using the above-described quality control (QC) cutoffs, we identified 56,551 high-quality epithelial cells and clustered them using the 715 iCMS marker genes, again identifying three clusters: normal, i2 and i3 (Supplementary Fig. 6a). Once again, cells from i3_MSS and MSI-H tumors intermingled within the i3 cluster, while cells from i2_MSS tumors formed a distinct i2 cluster. Similarly, i2 and i3 metagene scores showed clear bimodality. In >90% of patients, >90% of malignant cells belonged to a single subtype (Supplementary Fig. 6b). These results corroborate two intrinsic transcriptomic subtypes, iCMS2 and iCMS3, with i3-MSS and MSI-H malignant cells being highly similar.

Classification of bulk transcriptomes

We used the 715 iCMS marker genes to classify 3,614 tumor bulk transcriptomes from 15 primary tumor datasets (The Cancer Genome Atlas (TCGA), SG-Bulk and 13 CMS cohort datasets) and similarly observed two groups with either high i2 or high i3 signatures (Fig. 3a and Extended Data Fig. 4a). Using nearest template prediction, 47% of the tumors were classified as i2 and 42% as i3 at Q value < 0.05. Most tumors were robustly classified by nearest template prediction even at Q value < 0.005. i2 and i3 tumors were identified at relatively similar proportions across multiple datasets and i2/i3 tumors from different datasets comingled (Fig. 3b and Extended Data Fig. 4a). This indicates that the intrinsic epithelial signatures can robustly identify the epithelial subtypes from bulk tumor transcriptomes.

Fig. 3: iCMS classification of bulk transcriptomes.
figure 3

a, Proportion of 3,614 patients classified as iCMS2, iCMS3 or indeterminate based on their bulk tumor transcriptome. The box on the right lists the parameters that will be correlated with iCMS, including: CMS, CRIS, CIMP, TMB and copy number variation, overall survival (OS), survival after relapse (SAR) and RFS. b, Heatmap of 715 iCMS marker genes used to classify the 455 TCGA and SG-Bulk tumor transcriptomes. Gene expression values were log-transformed, zero-centered and scaled to unit variance. Upper annotation bars show clinical, mutational and copy number gain/loss categorized as amplified (≥4 copies), gain (2.5–4 copies), diploid (1.5–2.5 copies) and loss (<1.5 copies), as well as TMB (MSI-H patients highlighted in brown). Right annotation bar shows the average scaled expression of each gene across four major cell types, based on scRNA-seq data from the CRC-SG1 cohort. Lower annotation track: FDR Q value of iCMS classification. c, Breakdown of iCMS2 and iCMS3 samples by anatomical side (top), MSI status (middle) and CMS (bottom). Statistics are based on all bulk tumor datasets, including only those for which the relevant annotations are available. d, Bulk tumor datasets: alluvial plot demonstrating the relationship between IMF classification and anatomical side, MSI status, CMS subtype and iCMS. e, Heatmap showing the coexpression pattern of 2,873 bulk tumor transcriptomes from 14 clinical cohorts. Rows are genes; columns are patients; ordering is by unsupervised hierarchical clustering. Gene expression values are normalized as in b. CMS, iCMS and CRIS labels are indicated above the map, together with selected clinical parameters. Annotation bars for four major tumor cell types are as in b. f, Kaplan–Meier plot of RFS of patients classified by CMS and iCMS. The table below the graph indicates the number of patients at risk for all groups at various time points, followed by the number of events and median survival (in months) with their confidence intervals. g, Summary table of survival analysis conducted in this study. P values are Cox proportional hazard models (as implemented by R survival package). FDR, False Discovery Rate.

Relationship with CMS and clinico-molecular characteristics

We examined clinical and molecular features of iCMS subtypes (Fig. 3a–c and Extended Data Fig. 4a–c). As before, we found that almost all MSI-H tumors were classified as iCMS3, together with the subgroup of iCMS3_MSS tumors. DNA methylation was a feature of the MSI-H group, although CpG island methylation phenotype (CIMP) status did not neatly substratify the MSS groups (Extended Data Fig. 4c). Right-sided tumors were mainly i3 (66%), and left-sided tumors mainly i2 (68%). Mucinous cancers of both MSI-H and MSS subgroups were mainly i3 (93%).

CMS1 (97%) and CMS3 (98%) tumors were mainly i3, while CMS2 (96%) tumors were mainly i2. However, CMS4 tumors can be either i2 or i3 (with an equal proportion), suggesting that fibrosis is decoupled from intrinsic epithelial structure (Fig. 3c,d).

We performed hierarchical clustering on 2,873 bulk tumor transcriptomes from 14 clinical cohorts. For each gene, we related the bulk expression to its expression in our single-cell dataset, stratified by major cell-type cohorts (Fig. 3e). We observed that tumors grouped together based on iCMS, MSI status and bulk CMS. At the highest level, iCMS2 and iCMS3 tumors separated, presumably due to distinct epithelial transcriptomes. Within iCMS3, MSI-H tumors mostly segregated as a subcluster characterized by high expression of immune-specific genes and minimal expression of fibroblast-specific genes. Within each of the major iCMS2 and iCMS3 groups, we observed a subset of fibrotic (CMS4) tumors characterized by increased expression of fibroblast, endothelial and some immune-specific genes. This suggests that iCMS, MSI status and CMS jointly inform the molecular classification of CRC. Tumors were not organized by CRIS–epithelial subgroups (Fig. 3e and Extended Data Fig. 4b).

Survival analysis

Across 1,762 tumors with survival data (Fig. 3f,g), CMS4 showed poor Relapse Free Survival (RFS) (hazard ratio (HR) = 1.78, P = 1.4 × 10−10), consistent with the literature1. Poor RFS was a particular feature of the CMS4/iCMS3 subgroup (Fig. 3f) (CMS4/iCMS3 versus all others: HR = 2.12, P = 7.6 × 10−13; CMS4/iCMS3 versus CMS4/iCMS2: HR = 1.63, P = 0.001). This effect also extended to inferior overall survival (Extended Data Fig. 4d; CMS4/iCMS3 versus all others: HR = 2.08, P = 3.8 × 10−9; CMS4/iCMS3 versus CMS4/iCMS2: HR = 1.68, P = 0.004). Survival after relapse was worse for i3 cancers relative to i2 (HR = 1.72, P = 8.3 × 10−6), as was overall survival (HR = 1.23, P = 0.04) (Extended Data Fig. 4e).

IMF classification

Microsatellite instability marks a subset of i3 cancers. Fibrotic CMS4 cancers were stratified by epithelial subtype (i2 versus i3) into two subgroups with distinct microenvironment composition and distinct likelihood of survival. Putting these together, we propose a refinement of the four-group bulk CMS classification, based on the three biological layers of intrinsic epithelial status (I), microsatellite status (M) and presence of fibrosis (F), termed ‘IMF’. IMF stratifies tumors into five commonly occurring classes: iCMS2_MSS_NF, iCMS2_MSS_F, iCMS3_MSS_NF, iCMS3_MSS_F and iCMS3_MSI.

Genomic features and functional associations

We examined the copy number architecture of iCMS3_MSI, iCMS3_MSS and iCMS2_MSS tumors based on 659 tumors from the TCGA and SG-Bulk cohorts (Fig. 4a). i2 tumors were driven by copy number changes in specific chromosomes commonly altered in CRC14,15. Gains in 7pq, 8q, 13q and 20pq and losses of 1p, 4pq, 8p, 14q, 15q, 17p and 18pq characterized i2 tumors. i3 tumors were either MSI-H and diploid or MSS tumors with fewer copy number changes than i2 (Fig. 4a and Extended Data Fig. 5a). TP53 mutations were more prevalent in i2 MSS tumors than in i3, perhaps contributing to overall genomic instability in the former group (Fig. 4a,c). This pattern of CNV enrichment is consistent with our inferCNV11 analysis of single cells from i2 and i3 tumors (Fig. 2d).

Fig. 4: Relationship of iCMS to genomic features.
figure 4

a, Copy number variation by chromosome arm in 659 patients from TCGA and SG-Bulk cohorts. Samples are ordered as i2_MSS, i3_MSS, i3_MSI. p53 mutation status is shown on the right for each sample. b, TMB in iCMS2_MSS (n = 389), iCMS3_MSS (n = 195) and iCMS3_MSI (n = 116) samples from TCGA and SG-Bulk data. Pairwise P values: two-sided Wilcoxon rank-sum test; overall P value: Kruskal–Wallis test. c, Scatterplot of proportion of TCGA and SG-Bulk samples with mutations in 333 CRC-associated genes, in iCMS2 (n = 344) versus iCMS3 (n = 281) (top) and iCMS2_MSS (n = 338) versus iCMS3_MSS (n = 181) (bottom). Dot size corresponds to Q value by two-sided Fisher’s exact test with Benjamini–Hochberg correction. Only genes with Q value < 0.05 and proportion mutated > 0.2 are labeled. d, Violin plot showing the expression fold-change of 715 iCMS marker genes in i2 relative to i3, categorized by copy number status. CNV Up, DEGs on chromosomal arms with frequent increase in copy number in i2; CNV Down, DEGs on arms with frequent loss of copy number in i2; red font, DEGs whose expression fold-change is discordant with the copy number change; blue font, concordant. e, GSEA results of MSigDB hallmark pathways in iCMS2 versus iCMS3. X axis, normalized enrichment score in iCMS2 relative to iCMS3. FC, fold-change.

Tumor mutation burden (TMB) was higher among i3_MSI tumors (median: 1,353) and similar between i3 and i2 MSS tumors (median: 98 and 105 for i2_MSS and i3_MSS, respectively) (Fig. 4b). Higher TMB entailed more mutated genes in MSI-H cancers (Extended Data Fig. 5b). Even with similar TMB, within the MSS group, i3 cancers were enriched for KRAS and PIK3CA mutations while i2 tumors were enriched in APC and TP53 mutations (Fig. 4c).

Some of the expression differences between i2 and i3 epithelial cells may be directly attributable to differences in DNA copy number. Of the 715 DEGs, 382 coincided with the above-mentioned chromosomal arms commonly amplified or deleted in i2 tumors. For the majority (91%) of these 382 genes, expression fold-change direction was concordant with the difference in average copy number between i2 and i3, suggesting that their differential expression could be a direct consequence of DNA gain or loss (Fig. 4d). Genes upregulated in i2 tumors were enriched for MYC and E2F targets, perhaps reflecting i2-specific amplification of the chromosomal arms in which MYC (8q) and E2F1 (20q) reside (Fig. 4a,e and Supplementary Fig. 7). Consistently, the MYC regulon defined by SCENIC showed higher expression in i2 epithelial cells (Fig. 2h). Genes upregulated in i3 cells were associated with Epithelial Mesenchymal Transition (EMT), inflammatory pathways and metabolic derangements (Fig. 4e, Extended Data Fig. 9, Supplementary Fig. 10 and Supplementary Tables 110).

DNA methylation

Most MSI-H tumors showed a global CIMP. At the other extreme, none of the iCMS2_MSS tumors were hypermethylated. The CIMP status of iCMS3_MSS tumors was variable across patients, with a minority displaying DNA hypermethylation. Overall, we did not detect consistent epigenetic differences between iCMS2 and iCMS3 (Extended Data Fig. 5d).

Cancer pathways

We analyzed signaling pathways commonly dysregulated in CRC: WNT16,17, MAPK18 and TGF-beta19 (Fig. 5a–c and Supplementary Fig. 11). Genes within the WNT pathway tended to be upregulated in i2 bulk tumors, presumably due to their upregulation in i2 epithelial cells (Fig. 4e), which could in turn be attributable to increased activity of transcription factors mediating WNT signaling (TCF7, ASCL2)20 (Fig. 2h). Intriguingly, the i2-upregulated set included genes such as NOTUM, AXIN2 and NKD1 that suppress WNT signaling via negative feedback during normal tissue homeostasis21 (Fig. 5a and Extended Data Fig. 6a). Upregulation of these negative feedback regulators is presumably a consequence of WNT hyperactivity in i2 tumors22,23. Finally, protein beta-catenin abundance was significantly higher in i2 compared with i3 in the TCGA reverse-phase protein arrays data (Extended Data Fig. 6b).

Fig. 5: Relationship of iCMS and IMF to common cancer pathways.
figure 5

ac, Heatmaps of mutation landscape (top), methylation (middle; a only) and bulk expression (bottom) of selected genes in the WNT (a), MAPK (b) and TGF-beta (c) pathways, across TCGA samples (n = 209). In the mutation Oncoprint, colors depict the type of mutation; a barplot of the cumulative frequency of each mutation is shown to the right, and the total frequency of mutations in each gene is shown to the left. The methylation heatmap is colored by beta-value, the gene expression heatmap is colored by scaled expression and the right annotation bar shows the average scaled expression of each gene across four major cell types (epithelial, immune, fibroblast, endothelial) from CRC-SG1 scRNA-seq data. In a, beta-catenin protein expression by reverse-phase protein arrays (RPPA) is displayed below the gene expression heatmap. d, Proportion of BRAF mutation classes in iCMS3_MSI (n = 48), iCMS3_MSS (n = 14) and iCMS2_MSS (n = 4) samples with BRAF mutations, in TCGA and SG-Bulk. e, Proportion of mutations in KRAS exons in iCMS3_MSI (n = 31), iCMS3_MSS (n = 88) and iCMS2_MSS (n = 87) samples with KRAS mutations, in TCGA and SG-Bulk. Number of samples in each group is labeled.

We next examined somatic mutations in the WNT pathway (Extended Data Fig. 6c–g). While i3 tumors displayed diverse WNT mutations, i2 tumors were primarily characterized by inactivating APC mutations. In particular, APC mutations were significantly more proximal in i2_MSS than i3_MSS (P = 8 × 10−07; Extended Data Fig. 6c). More proximal APC mutations result in shorter, truncated proteins, associated with higher beta-catenin signaling24,25. The variant allele frequency of APC mutations was higher in i2_MSS than in i3_MSS (Extended Data Fig. 6e). In contrast, i3 tumors were enriched for other WNT mutations, including ligand-independent CTNNB1 mutations, as well as ligand-dependent RNF43 and ZNRF3 mutations targeting the R-spondin 1 (RSPO1)-associated negative feedback loop21, especially in iCMS3_MSI (Extended Data Fig. 6f,g).

We next evaluated alterations associated with MAPK pathway upregulation in cancer. i3 cancers had more frequent KRAS, PIK3CA and BRAF mutations (Figs. 4c and 5b), including mutations known to be associated with more prominent MAPK pathway upregulation26,27. BRAF V600 class 1 mutations were only observed in i3 cancers and KRAS exon 3 mutations were enriched amongst i3 cancers (Fig. 5d,e and Extended Data Fig. 7a,b). i3 cancers showed higher expression of downstream MAPK components (DUSP4 and ETV5) and i2 cancers had overexpression of EGFR ligands, AREG and EREG (Fig. 5b and Extended Data Fig. 7c). In i3 tumor cells, published gene signatures related to MAPK activity and KRAS and BRAF activating mutations28,29,30,31,32 were more highly expressed (Supplementary Fig. 12).

Upregulation of TGF-beta signaling was more prominent in i3 cancers. Genes in the TGF-beta signaling pathway, including SMAD4, were more frequently mutated in i3 cancers (Fig. 5c and Extended Data Fig. 7d). Expression of SMAD2/3/4 is increased in i3 cancers (Extended Data Fig. 7e) but gene signatures related to TGF-beta activity1,33,34,35,36,37 were not consistently different in i3 and i2 tumor cells (Supplementary Fig. 11).

Composition of tumor microenvironment

To compare cell-type abundance across tumor types, we identified marker genes for nine major fibroblast, immune and endothelial cell types (Extended Data Fig. 2b) from CRC-SG1 single-cell data, and calculated the metagene expression score for each cell type by averaging its markers in bulk transcriptomes (Supplementary Notes). Consistent with previous reports, we observed higher T/natural killer (NK) cell scores in MSI-H tumors, and elevated fibroblast, endothelial and monocyte/classical dendritic cell (McDC) scores in fibrotic (CMS4) tumors (Fig. 6a and Extended Data Fig. 8a)38. However, within fibrotic tumors, we observed higher T/NK, McDC and neutrophil metagene scores in iCMS3_MSS_F than in iCMS2_MSS_F (Fig. 6a and Extended Data Fig. 8a). EPIC cell-type deconvolution39 produced similar results (Extended Data Fig. 8b). We used matched exome sequencing data to estimate sample tumor purity. While fibrotic tumors had lower tumor purity, iCMS3_MSS_F still had decreased tumor purity compared with iCMS2_MSS_F (Extended Data Fig. 8c).

Fig. 6: Epithelial cell interactions with microenvironment.
figure 6

a, Heatmap of the average scaled gene expression of cell-type-specific signatures of the nine major cell types in 577 bulk samples from TCGA and SG-Bulk datasets. b,c, UMAP of T cells (b) (n = 76,812 cells) and fibroblasts (c) (n = 31,451 cells) from 14 patients in CRC-SG1 dataset, colored by subtypes, used in signaling analyses. d, NicheNet analysis using i2 up gene set (left) and i3 up gene set (right). The heatmap depicts the regulatory potential scores (purple) for the top 200 target genes of each of the top 20 ligands ranked by Pearson correlation (orange) after filtering at a quantile cutoff of 0.33 for the regulatory potential score. The dotplot on the right depicts the average scaled patient-wise pseudo-bulk expression of each of the top-ranked ligands in each cell type across patients in the CRC-SG1 cohort. Dot size corresponds to the percentage of cells expressing the ligand in each cell type. e, Metascores for top three inflammation-related pathways identified by GSEA, in 577 bulk samples from TCGA and SG-Bulk, split by IMF: i2_MSS (n = 240), i2_fibrotic (n = 82), i3_MSS (n = 92), i3_fibrotic (n = 58), i3_MSI (n = 105). f, CXCL13 and cytotoxicity gene program scores (from Pelka et al. 13) in 462 bulk samples from TCGA, split by IMF: i2_MSS (n = 189), i2_fibrotic (n = 71), i3_MSS (n = 74), i3_fibrotic (n = 48), i3_MSI (n = 80). In e and f, P values are by two-sided Wilcoxon rank-sum test without adjustment of multiple comparison. DE, Differentially expressed; SM, Smooth Muscle.

Next, we performed bulk tumor DEG analysis of iCMS3_MSS_F compared with iCMS2_MSS_F. Genes upregulated in iCMS3_MSS_F were specific to McDC and endothelial cells in our single-cell data (Extended Data Fig. 8d). Thus, in the fibrotic context, iCMS3 tumors are associated with increased fibroblast, myeloid and endothelial cell signatures and decreased tumor purity, suggesting that, when fibrosis develops, iCMS3 tumors have a larger fibrotic reaction than iCMS2 tumors. In addition, in the nonfibrotic context, iCMS3 tumors show increased immune cell signatures (T/NK, plasma-B cell, McDC and granulocyte) compared with iCMS2 (Fig. 6a and Extended Data Fig. 8a).

Cell signaling interactions

We examined tumor epithelial cell signaling with immune cells, endothelial cells and fibroblasts8 (Extended Data Fig. 2 and Fig. 6c) using NATMI40 and Nichenet41 (Fig. 6d and Supplementary Fig. 8a). Signaling pathway target genes with the highest regulatory potential scores and top ligands prioritized by NicheNet included key regulons previously identified in our regulon analysis, such as CEBPB, ARID3A and MYC in i2. For i3, top ligands included FGF2 and IL1B, reported to promote invasiveness in CRC42,43. Using NATMI, we ranked signaling interactions and inspected top ligand–receptor combinations (Supplementary Fig. 8b and Supplementary Table 11). Differential interactions stronger in i2 included EREG-to-EGFR signaling from epithelial tumor cells to epithelial tumor cells (autocrine) and fibroblasts (paracrine) (Supplementary Fig. 8c). Multiple immune–epithelial interactions involving T/NK cells and McDCs with tumor epithelial cells were predicted in i3 cancers (for example, IL1B in McDCs to IL1R2 in i3 tumor cells) (Supplementary Fig. 8c).

Immune response

Our signaling analyses suggested pro-inflammatory interactions in i3 tumors. The NFKB1 regulon, associated with inflammation, was upregulated in i3 (Fig. 2h). We observed an increase in immune cell signatures, including T/NK cells, McDCs and neutrophils, in iCMS3_MSS tumors, in both the fibrotic and nonfibrotic settings (Fig. 6a).

Epithelial gene set enrichment analysis (GSEA) identified multiple immune pathways amongst the top pathways upregulated in i3 cells, including ‘INTERFERON GAMMA RESPONSE’, ‘INFLAMMATORY RESPONSE’ and ‘INTERFERON ALPHA RESPONSE’. We calculated metagene scores using the GSEA leading edge genes for these three top inflammation-related pathways in our bulk dataset (Fig. 6e). MSI-H and fibrotic (CMS4) tumors had higher expression of inflammation-related pathways38. Notably, in both the fibrotic and nonfibrotic settings, i3 tumors had higher expression of inflammation-related pathways than i2 tumors.

A recent single-cell CRC study13 identified T cell program activity scores that were different between MSI-H and MSS CRCs. We hypothesized that iCMS would provide a further substratification of increased immune activity within MSS tumors. In TCGA bulk transcriptomes, we quantified the expression of two T cell programs of anti-tumor reactivity and effector function (‘CXCL13 T cell’ and ‘Cytotoxicity’ programs) that this study had identified as differentially active between MSI-H and MSS CRCs (Fig. 6f). In both fibrotic and nonfibrotic contexts, iCMS3_MSS tumors were associated with a more inflammatory, immune-activated environment than iCMS2_MSS tumors, with levels in the iCMS3_MSS_F group similar to iCMS3_MSI tumors. Our single-cell and bulk analyses point to iCMS3_MSS as a unique subset of MSS CRCs, with similarities to MSI-H tumors, increased immune activation, and higher signatures of T and myeloid cell infiltration and anti-tumor cytotoxicity.

Pre-invasive and cell lineage gene sets

Recently, a pre-cancer atlas study identified two cell types44, one attributable to adenomatous polyps and one to sessile serrated lesions (SSLs). In our single-cell data, most adenomatous polyp markers showed higher expression in iCMS2 patients, whereas SSL markers were upregulated in iCMS3 (Fig. 7a,b and Extended Data Fig. 10a). The previous study noted that SSLs were highly enriched for genes not normally expressed in the colon, and suggested that gastric metaplasia may underlie their etiology44. We therefore hypothesized that iCMS3 tumors may also show the same trend. Indeed, TissueEnrich45 analysis indicated that the genes upregulated in iCMS3 epithelial cells (iCMS3 Up) were significantly enriched for stomach-specific expression, and also for expression in other foregut tissues such as esophagus and duodenum (Fig. 7c). Similarly, genes related to gastric metaplasia were upregulated in iCMS3 epithelial cells (Fig. 7d and Extended Data Fig. 10a), suggesting that this process might be active within i3 cancers. Genes specific to normal colon were preferentially downregulated in both i2 and i3 tumors, suggesting loss of differentiation in oncogenesis (Fig. 7c and Extended Data Fig. 10a). Furthermore, inspecting the expression of the leading edge genes in our GSEA, we also observed upregulation of crypt top genes in i3 cancers and crypt bottom genes in i2 cancers (Fig. 7e and Extended Data Fig. 10a). These results suggest that i3 tumors show gastric metaplasia and may arise from malignant transformation of SSLs, which in turn could originate from cells resembling crypt top. Conversely, i2 tumors may arise from cells resembling crypt bottom, progressing via adenomatous polyps before becoming full-blown cancers.

Fig. 7: Association of iCMS markers with polyp subtypes and normal tissues.
figure 7

a,b, Heatmaps of tubular adenoma (AD) (a) and SSL (b) marker genes obtained from Chen et al. 44, colored by the average of scaled (z-transformed) expression values of epithelial single cells from five-cohort scRNA-seq data (patients = 61). c, Barplots quantify enrichment of tissue-specific genes in each of the four DEG sets, calculated using the TissueEnrich package (iCMS2 Up: 308; iCMS2 Down: 279; iCMS3 Up: 74; iCMS3 down: 54; total: 715). Red line, P = 0.1. The heatmaps show expression levels of the seven iCMS3-Up DEGs defined as stomach-specific in the TissueEnrich database. Left, scaled expression in diverse tissues. Right, scaled epithelial pseudo-bulk expression in 61 patients. d, Heatmap of gastric metaplasia signature genes, similar to a and b. e, Heatmap of GSEA leading edge genes within crypt top and crypt bottom gene sets, showing scaled epithelial pseudo-bulk expression levels across 61 patients from five scRNA-seq cohorts.

Drug response of iCMS classes

We classified iCMS status of cell lines from the CTRPv2 dataset46 and observed that i2 and i3 cell lines showed differential sensitivity to numerous drugs (Supplementary Fig. 9a,b). However, the difference in sensitivity to standard-of-care chemotherapeutics such as Fluorouracil (5-FU), oxaliplatin and 7-ethyl-10-hydroxycamptothecin (SN-38) was not significant. We then evaluated two sets of genes whose expression was correlated with drug response (drug response signatures)47,48,49,50. For Folinic acid, fluorouracil and oxaliplatin (FOLFOX), 5-FU, avastin, cetuximab, afatinib and AZD8931, gene sets positively correlated with drug sensitivity were upregulated in iCMS2 cells and genes correlated with drug resistance were downregulated. Similarly, iCMS3 cells showed patterns of up- and downregulation suggesting responsiveness to FOLFIRI, gefitinib and vandefanib (Extended Data Fig. 10b). This prompts investigation of potentially differential drug responses based on iCMS status in retrospective analyses of tumor samples from completed clinical trials.

Read more here: Source link