Single-cell analyses define a continuum of cell state and composition changes in the malignant transformation of polyps to colorectal cancer

Mapping molecular changes across malignant transformation

We generated single-cell data for 81 samples collected from eight FAP and seven non-FAP donors (Fig. 1a and Supplementary Tables 1 and 2). For each tissue, we performed matched scATAC-seq and snRNA-seq (10x Genomics). We obtained high-quality single-cell chromatin accessibility profiles for 447,829 cells from 80 samples, with a mean transcription start site (TSS) enrichment of ~8 for most samples (Extended Data Fig. 1a). After removing low-quality snRNA-seq cells and samples, we obtained single-cell transcriptomes for 201,884 cells from 70 samples (Extended Data Fig. 1b). Whenever there was sufficient tissue, we generated microscopic pathology data (Extended Data Fig. 2a and Supplementary Table 2) and found the majority of polyps were tubular adenomas, the most common polyp type identified in colonoscopies.

Fig. 1: Single-cell atlas of expression and chromatin accessibility in CRC development.
figure 1

a, Summary of the samples in this study. The bar chart shows the number of normal/unaffected colon tissues (gray), adenomas (purple) and CRCs (red) assayed for each patient. Locations of samples assayed from a single patient are indicated on the colon on the upper right. These data include deep profiling of four patients with FAP from whom we assayed 8–11 polyps, 0–1 carcinomas and 4–5 matched normal (unaffected) tissues. From non-FAP donors, we collected data on normal colon (9 samples from 2 donors), polyps (1 sample from 1 donor) and CRC tissues (4 samples from 4 patients). b,c, UMAP representations of all snRNA-seq (b) and scATAC-seq (c) cells colored by whether the cells were isolated from normal/unaffected colon tissues, adenomas or CRCs. d,g, UMAP representations and annotations of immune (d) and stromal (g) cells. e,h, Fraction of each immune (e) and stromal (h) cell type isolated from normal (green), unaffected (blue), polyp (purple) and CRC (red) samples. The color gradations within each color represent the contributions of each single sample (for example, each shade of red is a single CRC). f, CODEX images of eight polyps and two CRCs where cells are labeled with dark blue, CD3 is labeled in green and PD1 is labeled in light blue. All samples tested are shown in f. CODEX imaging of individual specimens was not reproduced. Representative sections of images of the entire specimen are shown in the figure. DC, dendritic cell; Fib., fibroblast; GC, germinal center; ILC, innate lymphoid cell; Myofib., myofibroblast/smooth muscle; NK, natural killer.

When all snRNA-seq cells (Fig. 1b) and scATAC-seq cells (Fig. 1c) are projected into low-dimensional subspaces, stromal and immune cells generally cluster by cell type whereas epithelial cells largely separate into distinct clusters comprising cells derived from polyps, unaffected tissues or CRCs. As a result, we annotated immune and stromal cells by subclustering cells from all samples, and analyzed epithelial cells separately.

T cells and myeloid cells are enriched in polyps and CRC

The immune compartment comprised B cells, T cells, monocytes, macrophages, dendritic cells and mast cells (Fig. 1d). We examined expression of known marker genes (Extended Data Fig. 1c) to annotate snRNA-seq data, and examined chromatin activity scores—a measure of accessibility within and around a given gene body—associated with marker genes to annotate the scATAC cells (Extended Data Fig. 1d). We identified a cluster of exhausted T cells in the scATAC data that exhibited high gene scores of T cell exhaustion marker genes and accessibility at exhausted T cell motifs, and was labeled as exhausted T cells by a published dataset (Extended Data Fig. 3a–g and Methods)15.

The cell types identified were present in nearly all samples, although some cell types were enriched or depleted in specific disease states (Fig. 1e and Extended Data Figs. 2b,c, 3h and 4). Significant differences in cell-type abundance were identified with both Wilcoxon testing and a generalized linear model-based method called Milo16, which produced consistent results. For example, regulatory T cells (Tregs) were enriched in polyps relative to unaffected tissue, while naive B, memory B and germinal center cells were enriched in unaffected tissues relative to polyps (Extended Data Fig. 4a,b). Enrichment of myeloid cells and specific types of T cells and depletion of B cells was recently reported in a group of 22 mismatch repair-proficient and 13 mismatch repair-deficient CRCs17, and we observe similar shifts in the tumor immune composition in precancerous polyps.

The enrichment of (1) Tregs in both polyps and CRC and (2) exhausted T cells in CRC suggests mechanisms of immune evasion in the precancerous and cancerous states18. T cell exhaustion, which occurs in response to chronic antigen stimulation and is characterized by reduced cytokine production and increased expression of inhibitory receptors, is thought to be a primary mechanism of immune evasion by cancers19,20. To further support the observation of T cell exhaustion only occurring in CRC, we performed CODEX imaging of CD3 and PD1 and found low or undetectable PD1 expression in eight polyps but found PD1 expression in both CRC samples tested (Fig. 1f).

Within the stromal compartment, we identified glial cells, adipose cells and multiple types of endothelial cells and fibroblasts (Fig. 1g). Fibroblast subtypes include crypt fibroblasts (WNT2B or RSPO3 high), villus fibroblasts (WNT5B high) and myofibroblasts (ACTA2 and TAGLN high) (Extended Data Figs. 1f,g and 5a)21,22. Consistent with previous results, we observe high expression of BMP signaling genes in villus fibroblasts (Extended Data Fig. 5a). In agreement with recent reports that crypt fibroblasts secrete semaphorins to support epithelial growth, we observe one fibroblast cluster with high expression of semaphorins (Extended Data Fig. 5a)23. This cluster of fibroblasts exhibited the highest expression of RSPO3, a factor that supports the intestinal stem cell niche24. We also observe a cluster of cancer-associated fibroblasts (CAFs) consisting almost exclusively of cells from CRCs, and a scATAC cluster of fibroblasts enriched for cells from polyps and CRCs with accessibility around some of the same genes as CAFs, which we term pre-cancer-associated fibroblasts (preCAFs) (Fig. 1h and Extended Data Figs. 2d,e and 4). These observations suggest that phenotypically distinct fibroblasts exist in polyps and tumors, and thus may play a role in tumorigenesis in precancerous lesions.

We next integrated our scATAC-seq and snRNA-seq datasets to enable analyses of regulatory elements and TFs potentially driving gene expression. We aligned the datasets with canonical correlation analysis (CCA) and assigned RNA-seq profiles to each scATAC-seq cell (integrated expression)25. We then labeled scATAC cells with the nearest snRNA-seq cells, which closely agreed with manual immune (Extended Data Fig. 1i) and stromal (Extended Data Fig. 5b) annotations. Finally, we identified peaks highly correlated to gene expression of proximal genes in our datasets, which resulted in 52,443 stromal peak-to-gene links (Extended Data Fig. 5c,d).

scATAC reveals preCAF population

CAFs promote cancer development and progression through diverse mechanisms including matrix remodeling, signaling interactions with cancer cells and perturbation of immune surveillance26,27,28. We observe a CAF cluster with high expression of known CAF marker genes FAP and TWIST1 (Extended Data Fig. 5a)29,30. Among the most significant snRNA-seq markers for CAFs were FAP, VCAN and COL1A2, which are involved in extracellular matrix remodeling and upregulated in multiple cancers30,31,32 (Fig. 2a). Specific expression of these genes by CAFs suggests fibroblasts participate in unique extracellular matrix remodeling in cancerous tissues that does not occur in normal colon or precancerous polyps.

Fig. 2: Epigenetic regulators of preCAFs and CAFs.
figure 2

a, Dot plot representation of significant (MAST test) marker genes for CAFs. b, Genomic tracks for accessibility around WNT2 and RUNX1 for different stromal cell types. Peaks called in the scATAC data and peaks-to-gene links are indicated below the tracks. For example, a regulatory element ~50 kb away from the WNT2 TSS that is most accessible in CAFs whose accessibility is highly correlated to gene expression of WNT2 is indicated below the tracks. Marker peaks (Wilcoxon FDR ≤ 0.1 and log2FC ≥ 1.0) for each fibroblast subtype are indicated below the tracks. c, Marker peaks (Wilcoxon FDR ≤ 0.1 and log2FC ≥ 0.5) for each stromal cell type. Significance is determined by comparing each cell type with a background of all other cell types. d, Hypergeometric enrichment of TF motifs in stromal cell marker peaks. e, Plot of maximum difference between chromVAR deviation z-score, depicting TF motif activity, against correlation of chromVAR deviation and corresponding TF expression. TFs with maximum differences in chromVAR deviation z-score in the top quartile of all TFs and a correlation of greater than 0.5 are indicated in red. f, RNA expression (top) and chromVAR deviation z-scores (bottom) for selected TFs. The RNA expression plotted is the expression in the nearest RNA cell following integration of the snRNA-seq and scATAC-seq data. Corresponding violin plots and boxplots quantifying integrated gene expression and chromVar deviation z-scores for cells in each cell type are shown at the right. Boxplots represent the median, 25th percentile and 75th percentile of the data, and whiskers represent the highest and lowest values within 1.5 times the interquartile range of the boxplot. Cell types with significantly higher (Wilcoxon test, FDR ≤ 0.01 and log2FC ≥ 1) integrated RNA expression when compared with all other cell types are indicated with an asterisk. Assoc., associated; C. Fib, crypt fibroblast; Endo., endothelial; Norm., normalized.

While CAFs are known to promote CRC progression, we next explored the role of fibroblasts in precancerous lesions. Because the preCAF cluster was enriched for cells from polyps, we examined accessibility around marker genes for CAFs and found many of these genes more accessible in preCAFs than other fibroblast subtypes. For example, CAFs secrete WNT2 to promote cell proliferation and angiogenesis in CRC33,34. CAFs and preCAFs exhibit the greatest accessibility at the WNT2 TSS (Fig. 2b), suggesting that chromatin changes promote expression of WNT2 in CAFs and preCAFs. We also observed that preCAFs demonstrated higher integrated expression of multiple CAF marker genes than other fibroblast subtypes (Extended Data Fig. 5e). We computed global CAF accessibility scores for all fibroblast subtypes (Methods) and found that preCAFs had the highest median CAF scores other than CAFs (Extended Data Fig. 5f). Further, accessibility in CAFs was most correlated with preCAFs; however, the correlation with one crypt fibroblast subtype was only slightly lower (Extended Data Fig. 5g). Together, this highlights the similarities between CAFs and preCAFs and suggests that preCAFs may perform similar functions to CAFs.

RUNX1 is associated with widespread accessibility in CAFs

We found that CAF marker peaks were enriched for JUN/FOS and CEBP motifs and preCAF marker peaks were enriched for JUN/FOS and FOX motifs (Fig. 2c,d and Methods). To nominate TFs driving changes in chromatin accessibility in different stromal cell types, we identified TFs with the highest correlation between their gene expression and the chromatin accessibility activity level of its DNA motif (Fig. 2e, x axis). Amongst the most correlated TFs were RUNX1, RUNX2 and CEBPB. We next plotted the expression and motif activities of these TFs on the Uniform Manifold Approximation and Projection (UMAP) representation of the stromal cells and in violin plots grouped by each cell type (Fig. 2f), and noted that chromatin activity levels for RUNX1 and RUNX2, which have similar motifs, are highest in CAFs and preCAFs. However, RUNX1 is primarily expressed in CAFs and preCAFs, while RUNX2 has much lower expression in CAFs, suggesting that RUNX1 is a stronger driver of accessibility at RUNX motifs than is RUNX2 in CAFs.

Consistent with the expression of these genes, we observed the greatest accessibility around the RUNX1 TSS in CAFs and preCAFs (Fig. 2b). When comparing gene scores for each stromal cell type with all other stromal cells, preCAFs had significantly higher RUNX1 gene scores (log2 fold-change (log2FC) > 1 and false discovery rate (FDR) < 0.01), and no other cell types met this significance threshold. When identifying accessibility closest to RUNX1, we found five significant marker peaks for preCAFs and four for CAFs (Fig. 2b).

Polyps are enriched for stem-like epithelial cells

We examined the epithelial cells that initially clustered by unaffected, polyp or CRC disease state (Fig. 1b,c and Extended Data Fig. 6e). To analyze these data, we first constructed RNA-seq and ATAC-seq references composed of normal epithelial colon cells collected from patients without FAP (Fig. 3a). We annotated cell types in this normal tissue using gene expression and gene activity scores of known marker genes (Extended Data Fig. 6a,b). A stem cell population with high expression and accessibility of LGR5, SMOC2, RGMB, PTPRO, EPHB2 and LRIG1 was evident (Extended Data Fig. 6b), as were goblet cells (MUC2 high) and BEST4+ enterocytes (BEST4 high). Following manual annotation, the snRNA-seq and scATAC-seq datasets were aligned with CCA25,35, and the scATAC cells were labeled based on the nearest snRNA-seq cells, which agreed with the manual annotations for 65% of cells, with mislabeled cells typically being labeled as the nearest cell type in the differentiation trajectory (Extended Data Fig. 6c,d).

Fig. 3: Stem-like features observed in epithelial cells.
figure 3

a, UMAP projection of snRNA-seq (left) and scATAC-seq (right) epithelial cells isolated from normal colon with cells colored by cell type. Colors for the cell types are defined in c. b, Projection of epithelial snRNA-seq (top) and scATAC-seq (bottom) cells from unaffected (left), polyp (center) and CRC (right) samples into the manifold of normal colon epithelial cells. Projected cells are colored by nearest normal cells in the projection and normal epithelial cells are colored gray. c, Fraction of each epithelial cell type isolated from normal (green), unaffected (blue), polyp (purple) and CRC (red) samples. Cell types are defined based on the identity of the nearest cell types when projecting epithelial cells into normal colon subspace. d, Boxplots depicting the fraction of cells within the epithelial compartment that are stem-like cells, enterocyte progenitors or enterocytes, divided by disease state. Abundances of each cell type in unaffected, polyp and CRC tissues are compared with their abundances in normal tissues with two-sided Wilcoxon testing and Bonferroni correction for multiple comparisons, and the resulting adjusted P values are listed in the plots. The boxplots are constructed with data from 8 normal samples, 18 unaffected samples, 48 polyp samples and 6 CRC samples. Boxplots represent the median, 25th percentile and 75th percentile of the data; whiskers represent the highest and lowest values within 1.5 times the interquartile range of the boxplot; and all points are plotted. e, Distribution of snRNA-seq and scATAC-seq stem scores in all epithelial cells in each sample. The rows represent individual samples and the columns represent 50 bins of stem scores from low to high for RNA (left) and ATAC (right). The heatmap is colored by the percentage of epithelial cells in each sample that are in a given bin of stem scores. A, adenocarcinoma; Ent., enterocyte; N, normal; P, polyp; TA, transit amplifying; U, unaffected FAP.

We then projected the remaining cells into this normal subspace25, and found that epithelial cells from polyps and CRCs tend to project closer to stem cells and other immature cells along the normal differentiation trajectory, whereas cells from unaffected tissues projected relatively evenly throughout the epithelial compartment (Fig. 3b). We classified all epithelial cells based on the nearest normal cells in the projection and found that cells originating from polyps and CRC samples are enriched for stem-like epithelial cells and depleted for mature enterocytes, suggesting that epithelial cells increasingly demonstrate a stem-like phenotype during the transformation from normal to polyp (Fig. 3b–d and Extended Data Fig. 4a,b). We speculate that the populations of stem-like cells in the polyps and CRCs likely represent the ‘cancer’ stem cells in these tissues. Expression of previously described intestinal stem cell and colon cancer stem cell marker genes in these stem-like populations is discussed in detail in a Supplementary Note and Extended Data Fig. 7a.

To quantify the degree of stemness in individual cells within samples, we assigned scores quantifying stemness for each snRNA-seq and scATAC-seq cell and ordered samples by the distribution of stem scores within each sample (Methods and Fig. 3e). As expected, unaffected samples have generally lower stem scores. A number of polyps clustered near the unaffected tissues, suggesting that they are relatively benign. However, cells from most polyps and CRCs typically had higher stem scores, with some demonstrating a larger spread of stemness and others with much tighter distributions of stem scores, indicating that some polyps may be more heterogeneous. Similar results were observed when ordering samples based on the nearest normal cell type in the projection into the normal colon subspace (Methods and Extended Data Fig. 7h).

Stem-like cells form a potential malignancy continuum

We next compared the gene expression and chromatin accessibility of polyp and CRC stem-like cells with normal stem cells to identify the aberrant gene expression and regulatory programs in precancerous and cancerous lesions. After computing differential peaks between stem-like cells from each sample and cells from the nearest normal cell type, we computed the principal components of the log2FC for these peaks, then ordered samples by their position along a spline fit in this space (Fig. 4a), where position in ordering can be interpreted as position in a continuum from normal tissue to cancer. We generated a similar RNA trajectory using differential genes rather than differential peaks (Methods). The ordering of samples along the continua defined from the snRNA-seq and scATAC-seq datasets exhibited strong agreement (Extended Data Fig. 6j). This analysis suggests that differences in gene expression and chromatin accessibility between stem cells and these stem-like polyp cells follow a stereotyped progression from early to late polyp to invasive CRC.

Fig. 4: The regulatory trajectory of malignant transformation.
figure 4

a, Malignancy continuum for snRNA-seq (left) and scATAC-seq (right). Principal components were computed on the log2FC values between stem-like cells from each sample and normal colon stem cells for the set of peaks and genes that were significantly differential (Wilcoxon FDR ≤ 0.05 and |log2FC | ≥ 1.5 for peaks; MAST test for genes) in at least two samples. A spline was fit to the first two principal components (red) and samples were ordered based on their position along the spline. b, Genomic alterations in common driver genes ordered by the malignancy continuum. c,d, Number of significantly differential genes (MAST test) (c) and peaks (Wilcoxon test) (d) for each sample relative to all unaffected samples. e,f, Heatmap of all genes (e) and peaks (f) that were significantly differentially expressed (MAST test, Padj ≤ 0.05 and |log2FC | ≥ 0.75) or accessible (Wilcoxon test, Padj ≤ 0.05 and |log2FC | ≥ 1.5) in ≥2 samples. Samples are ordered along the x axis by the malignancy continuum defined in d. Genes and peaks are k-means clustered into ten groups. g, Hypergeometric enrichment of TF motifs in k-means clusters of peaks defined in e. h, log2FC in expression of ASCL2, HNF4A and GPX2 in stem-like cells from each sample relative to stem-like cells in unaffected samples plotted against the malignancy continuum defined in d. Samples are colored based on if they are derived from polyps or CRCs.

To determine if this continuum is specific to the stem-like cells, which would be consistent with these cells being the only malignant cells in the samples, or if other epithelial cells also exhibit a continuum, which would be consistent with other cell types within the polyp being derived from cancer stem-like cells rather than normal cells, we performed the same analysis with TA2 cells (Extended Data Fig. 6f). We found that TA2 cells exhibit a similar continuum, suggesting that they continue to be derived from stem-like cells. When we perform a control analysis with plasma cells, which are not derived from cancer cells, we do not observe a similar continuum (Extended Data Fig. 6f). Comparison of the continuum with microscopic pathology and genomic alterations (Fig. 4b) is discussed in the Supplementary Information.

After computing the trajectory, we repeated the differential analysis using all unaffected samples rather than normal samples to increase the total number of patients and cells in the background group. We observe that the absolute number of significantly differential peaks and genes gradually increased along the malignancy continuum—with adenocarcinoma samples exhibiting the largest number of differential peaks and genes (Fig. 4c,d).

Gene expression changes along the malignant continuum

We examined gene expression changes along this malignancy continuum by selecting genes differentially expressed in at least two samples then clustering these genes into ten k-means clusters (Fig. 4e). These clusters correspond to groups of genes that become differentially expressed at distinct stages of malignant transformation. For example, clusters 1–4 comprise genes upregulated in stem-like cells in early-stage polyps when compared with unaffected stem cells. Members of cluster 4 include OLFM4, a marker of intestinal stem cells36, indicating that OLMF4 expression increases in stem-like cells from polyps as they approach malignancy. Cluster 4 also includes GPX2, a glutathione peroxidase known to be upregulated in CRC that functions to relieve oxidative stress by reducing hydrogen peroxide, facilitating both tumorigenesis and metastasis37 (Fig. 4h). The upregulation is not donor dependent, and we observe the same trend across all donors in our study (Extended Data Fig. 6g). We observed translation Gene Ontology terms enriched in cluster 4 and splicing and RNA-processing Gene Ontology terms enriched in cluster 2 (Extended Data Fig. 6k). Clusters of genes that gradually reduce expression along the transition from normal colon to cancer (clusters 6–9) and genes specific to malignant transformation are discussed in a Supplementary Note and Extended Data Fig. 8a.

Polyps demonstrate increased activity of TCF and LEF

To identify groups of polyps associated with invasive transformation, we clustered the 36,374 peaks significantly differential compared with the nearest unaffected cell type in at least two samples into ten k-means clusters (Fig. 4f), revealing five clusters that become more accessible and five clusters that become less accessible at different stages of the transition to cancer. To identify TFs driving chromatin accessibility changes in the transition from normal colon to CRC, we computed hypergeometric enrichment of motifs in each cluster of peaks from Fig. 4f (Fig. 4g) and ensured the stability of these results (Extended Data Fig. 7b–g).

TCF and LEF family motifs were enriched in all clusters that became more accessible across the malignancy continuum (clusters 1–5), consistent with the fact that loss of APC leads to β-catenin accumulation in the nucleus, which interacts with TCF and LEF TFs to drive WNT signaling38,39,40. This regulatory transformation is gradual across the malignant continuum—new peaks containing TCF and LEF motifs continue to open at all stages of colon cancer development, as does overall accessibility aggregated across TCF and LEF motifs, suggesting that WNT signaling gradually increases throughout this transformation, over and above what is observed in normal stem cell populations.

Cluster 3 peaks, which became more accessible in later-stage polyps and CRC, also exhibited enrichments of ASCL2 motifs (Fig. 4g). ASCL2 is a master regulator of intestinal stem cell fate, and induced deletion of ASCL2 leads to loss of LGR5+ intestinal stem cells in mice41. Consistent with a linkage between a more stem-like state in polyp epithelium and more advanced malignant continuum scores, ASCL2 expression gradually increases as polyps approach malignant transformation (Fig. 4h), again indicative of a ‘super stem’-like phenotype, wherein master regulators of stem state are even more active than they are in normal stem cells.

Motifs lost along the malignancy continuum include HOX family motifs, KLF motifs and GATA motifs (Fig. 4g), and specific KLF TFs along the malignancy continuum are discussed in detail in a Supplementary Note and Extended Data Fig. 8d,e. Clusters 4 and 5 exhibit large accessibility increases only in CRC samples, and the greatest enrichment for HNF4A motifs (Fig. 4g). This observation suggests differential usage of HNF4A in polyps, where it decreases to drive WNT signaling, versus in CRC, where it is upregulated to drive cancer-specific accessibility differences (Supplementary Note and Extended Data Fig. 8b,c).

Remodeling of cellular composition along malignant continuum

We calculated the fractional contributions of each cell type to each sample as a function of position in the malignancy continuum, and found some cell types were highly correlated with progression along the malignancy continuum. For example, the fraction of stem cells within a sample gradually increases throughout malignant transformation (Fig. 5a,i). Similarly, the number of mature enterocytes decreases as polyps transform to carcinomas (Fig. 5b,i). Milo analysis revealed that neighborhoods of stem-like cells tend to be significantly more abundant at the end of the malignancy continuum (Extended Data Fig. 4b). In the secretory compartment, which primarily consists of immature and mature goblet cells, we observe a fractional increase in immature goblet cells in many polyps. In carcinomas we see a pervasive lack of differentiation into the secretory lineage, effectively eliminating immature and mature goblet cells (Fig. 5c,d,i). This observation is consistent with previous work reporting a depletion of goblet cells in nonmucinous colon adenocarcinomas42. Previous work has also found that knockout of MUC2 leads to the formation of more adenomas and carcinomas in mice43, suggesting that the loss of immature and mature goblet cells may even contribute to tumorigenesis.

Fig. 5: Dynamics of cell-type representation in malignant transformation.
figure 5

ah, Fraction of cell type in each scATAC sample plotted against position of the sample in the malignancy continuum defined in Fig. 4d for stem-like cells (a), enterocytes (b), immature goblet cells (c), goblet cells (d), Tregs (e), exhausted T cells (f), preCAFs (g) and CAFs (h). Samples are colored based on if they are derived from unaffected tissues, polyps or CRCs. Fractions are computed by dividing the number of cells of a given cell type by the total number of cells in the compartment (epithelial versus immune versus stromal). i, Stacked boxplot representation of the fraction of epithelial cells of each cell type for each scATAC sample along the malignancy continuum.

Outside the epithelial compartment, we also observe changes in cellular composition across the transformation from unaffected to polyp to carcinoma. Within the stromal compartment, the fraction of preCAFs gradually increases, while CAFs only appear in CRCs (Fig. 5g,h). Within the immune compartment, Tregs are increased in the more malignant polyps and CRCs, while exhausted T cells only appear in CRCs (Fig. 5e,f and Extended Data Fig. 4b). Tregs are known to suppress the antitumor immune response and are typically present at high levels in the tumor microenvironment44. The gradual increase in Tregs may be a mechanism of immune evasion in precancerous polyps. We discuss possible cell–cell interactions between stromal and epithelial cells along the malignant continuum in a Supplementary Note and in Extended Data Fig. 8f,g.

Comparing CRC DNA methylation changes with continuum accessibility

Aberrant DNA methylation is a primary mechanism of tumorigenesis in CRC45,46,47, but the timing and extent to which methylation changes drive changes in chromatin accessibility before and during malignant transformation is not known. We identified differentially methylated probes between normal and CRC samples (Extended Data Fig. 9d) in The Cancer Genome Atlas (TCGA) DNA methylation data (Illumina 450K array)48. For the ~89,000 chromatin accessibility peaks from epithelial cells that overlap at least one 450K array probe, we determined how many overlapped at least one hypermethylated site, at least one hypomethylated site or no differentially methylated sites. We then divided the peaks into groups based on whether they were members of significantly upregulated or significantly downregulated clusters identified in Fig. 4h.

For peaks overlapping hypomethylated probes, approximately one-third (534) belonged to clusters that became significantly more accessible along the continuum, while <0.5% (5) became significantly less accessible (Fig. 6a). We saw similar correspondence for peaks overlapping hypermethylated probes, with approximately one-quarter (754) becoming less accessible, and <0.5% (9) becoming more accessible. Therefore, hypermethylation and hypomethylation in CRC nearly perfectly predict that accessibility at that site will either decrease or increase (respectively), or remain unchanged. In peaks not meeting the significance threshold, we still observe less aggregate accessibility within peaks overlapping hypermethylated probes and more accessibility when they overlap hypomethylated probes (Fig. 6b). However, we also observe that 79.4% (2,096) of significantly more accessible and 76.3% (2,440) of less accessible peaks overlap nondifferential probes, implying that a majority of chromatin accessibility changes are likely not driven by methylation.

Fig. 6: Integration of single-cell colon data with CRC methylation data reveals CRC DMRs with early changes in chromatin accessibility.
figure 6

a, Table relating the change in accessibility for peaks to the methylation status of Illumina 450K methylation probes they overlap. In total, ~89,000 peaks overlapped 180,000 450K probes. Peaks classified as up were members of clusters 1–5 in Fig. 4f and peaks classified as down were members of clusters 6–10 in Fig. 4f. b, Heatmaps of peaks overlapping hypomethylated (top) and hypermethylated (bottom) 450K probes in CRC. The heatmaps are split into peaks from more accessible and less accessible groups defined in Fig. 4h and peaks not included in Fig. 4h. For nondifferential (nondiff) peaks overlapping hypermethylated probes, \({{{P}}}\left( {\overline {{\mathrm{log}}_{2}{\rm{FC}}} < 0} \right) = 0.81\) and sign test P < 10−50. For nondifferential peaks overlapping hypomethylated peaks, \({{{P}}}\left( {\overline {{\mathrm{log}}_{2}{\rm{FC}}} > 0} \right) = 0.73\) and sign test P < 10−50. c, Number of significantly differential peaks overlapping hypomethylated or hypermethylated 450K probes for each sample. The total number of peaks overlapping hypermethylated and hypomethylated probes is listed in each plot. d, Accessibility tracks around ITGA4 and NR5A2, which are hypermethylated in CRC. Tracks are ordered by position of the corresponding sample in the malignancy continuum defined in Fig. 4. DMR, differentially methylated region.

We next plotted the number of differential peaks overlapping hypermethylated and hypomethylated probes across the malignancy continuum (Fig. 6c), and found that changes in chromatin accessibility that occur in regions that are ultimately differentially methylated in CRC accumulate along the transition from normal to cancer, with the greatest number observed in late-stage polyps and CRC.

Among regions that overlap hypermethylated probes in CRC that become less accessible in polyps are several previously reported cancer-specific hypermethylated loci49. For example, the promoter region and multiple distal regulatory elements near the ITGA4 gene are accessible in normal colon, unaffected FAP colon and very early-stage polyps, but become closed early in the progression to CRC and remain closed even in low-grade polyps (Fig. 6d). The gene with the most nearby differential peaks overlapping hypermethylated probes in our dataset was NR5A2. Multiple peaks near this gene become less accessible along the malignancy continuum (Fig. 6d) and expression of NR5A2 also gradually decreases along the malignancy continuum (Extended Data Fig. 6h). NR5A2 is a nuclear receptor that has been linked to a wide range of functions including inflammation and cell proliferation50. The hypermethylation, decrease in accessibility, and decrease in gene expression of NR5A2 suggests that the pro-inflammatory state that may be triggered by the loss of NR5A2 might have a role in tumorigenesis.

Hypermethylated DNA regions in CRC have also been incorporated into CRC screening tests, including hypermethylation of the promoter regions of BMP3 and NDRG4 (ref. 51). We observe multiple distal elements around BMP3 that become inaccessible in the middle of the malignancy continuum (Extended Data Fig. 9a). We observe many regions with a similar behavior: sharp increases or decreases in accessibility at a specific point along the malignancy continuum. We speculate that testing for accessibility, or methylation, at these loci may enable staging of polyps along the malignancy continuum. This approach also identifies methylation markers/loci (for example, GRASP, CIDEB) specific for malignant transformation in CRC (Extended Data Fig. 9b,c), and differential genes whose promoters overlap CRC methylation changes (Extended Data Fig. 9e).

Read more here: Source link