Overview of the droplet-based snRandom-seq method for FFPE tissues
The main workflow of snRandom-seq is shown in Fig. 1. For single nucleus isolation of FFPE tissues, the areas of interest of banked FFPE tissue block were first selected and placed into tubes. Deparaffinization and rehydration were carried out with standard xylene and alcohol wash. Afterward, nuclei were dissociated and permeabilizated. For comprehensive and high-throughput single nucleus total RNA-seq, we provided a strategy with a random-primer-based chemistry to capture full-length total RNAs, and an easy-to-operate droplet-based platform to tag single nucleus. Bare single-strand DNAs were blocked in situ by multiple annealing and extension of blocking primers. cDNAs of total RNA were converted in situ by multiple annealing of random primers and oligo(dT) primers in reverse transcription. To decrease the doublet rate, we involved a pre-indexing strategy into the reverse transcription step according to the published scifi-RNA-seq18. The nuclei were split into different tubes for reverse transcription with pre-indexed random primers, then pooled for the subsequent reaction. Poly(dA) tails were added to the 3′ hydroxyl terminus of the cDNAs in situ by terminal transferase (TdT). We also established a microfluidic platform for high-throughput single nucleus barcoding based on our previous work16,17. During the barcoding reaction in droplets, the poly(dT) primers were released from beads by enzymatic cutting19, and simultaneously, the cDNAs were released from the nucleus by RNA degradation. Then poly(dT) primers bound with the poly(dA) tail on the end of the cDNAs and extended to add a specific barcode to the cDNAs in each droplet. After barcoding, we broke the droplets, amplified the barcoded cDNA, and prepared the next-generation sequencing (NGS) library for paired-end sequencing.
Validation of snRandom-seq using the human-mouse mixture sample
snRandom-seq utilizes random primers to capture total RNAs in single nuclei (Fig. 1), which differs from the current poly(A)-based and probe-based single-cell RNA-seq methods. Therefore, we performed a standard mixed species experiment with cultured human (293T) and mouse (3T3) cell lines to assess the fidelity of snRandom-seq. Freshly harvested 293T and 3T3 cells were lysed into nuclei and mixed for fixation. The fixed nuclei were used for snRandom-seq (Fig. 1). Before proceeding with microfluidic encapsulation, the nuclei were imaged to confirm single nucleus morphology and counted (Supplementary Fig. 1a). A high-throughput microfluidic platform was established for single cell/nuclei barcoding in snRandom-seq (Fig. 2a, Supplementary Fig. 1b). For barcode beads synthesis, the hydrogel bead generation device and the cell encapsulation device were designed and fabricated as previously described20 (Supplementary Fig. 2a). Hydrogel beads of 40 μm diameter were precisely produced (Supplementary Fig. 2b). Three rounds of split-and-pool-based ligation were performed on these hydrogel beads for DNA barcode synthesis (Supplementary Fig. 2c, Supplementary Table 2). The high reaction efficiency of each ligation step was reflected by the sharp peak in the electropherogram of released barcode primers (Supplementary Fig. 2d). Nucleus, barcode bead, and reagents mix were co-compartmentalized in water-in-oil emulsions using the microfluidic platform (Fig. 2a) and each individual nuclei were encapsulated into a droplet with a barcode bead (Fig. 2b).
After barcoding and amplification, the fragment size of the cDNA library of the human-mouse mixture peaked between 300 and 800 bps (Fig. 2c), which is not needed to fragment but is just suitable for NGS. After data processing, we identified 2250 high-quality unique nucleus barcodes by the significant steep slope in the barcode-gene rank plot (Fig. 2d), which suggests a clear separation of true nuclei from background noise. The nuclei capture rate was 42.2% and the percentage of reads mapped to the true nuclei was 76%. We counted the ratio of reads mapped to both human and mouse genomes in every single nucleus and found that pre-indexed primers markedly decreased the doublet rate (from 2.9% to 0.3%) (Fig. 2e, Supplementary 1c). The doublet rate of snRandom-seq is significantly lower than that of other droplet-based sc/snRNA-seq methods (sNucDrop-seq: ~2.6%, VASA-drop: 3.1%). Consistently, very high species specificity of UMI (99%) was observed (Fig. 2f), suggesting that snRandom-seq produced high-fidelity single nucleus libraries. The percentage of the reads mapped to exon or intron of identified human and mouse nuclei was calculated, and the results showed that the reads mapped to intron were three times of the reads mapped to exon (Fig. 2g). Additionally, many long non-coding RNAs (lncRNAs) and short non-coding RNAs, including small nucleolar RNA (snoRNA), small nuclear RNA (snRNA) and microRNA (miRNA), were detected (Supplementary Fig. 1d). Those results suggested that snRandom-seq captured full-length transcripts comprehensively.
Gene and UMI count distribution showed that snRandom-seq captured a median of 4141 genes and 11,594 UMIs in single 293T nucleus by sequencing average ~29k reads per 293T nucleus (Fig. 2h), and 3427 genes and 9795 UMIs in single 3T3 nucleus by ~25k reads per 3T3 nucleus (Fig. 2i). The results indicated that snRandom-seq is more sensitive than other two reported droplet-based high-throughput snRNA-seq methods (DroNc-seq21: average 3295 genes and 4643 UMIs with ∼160k reads per nucleus for 5636 3T3 nuclei; sNucDrop-seq22: average 2665 genes and 5195 UMIs with ∼23k reads per nucleus for 1984 3T3 nuclei) (Supplementary Fig. 1e). Saturation analysis showed that the number of genes detected in snRandom-seq had not yet reached saturation point by 60k uniquely aligned reads per 3T3 and 293T nucleus (Fig. 2j). We also compared our snRNA-seq data to the widely used high-throughput 10X Chromium Single Cell 3′ Solution V323 and the latest reported high-throughput VASA-drop10 for scRNA-seq. At a low sequencing depth (<10k), the sensitivity of snRandom-seq in 3T3 and 293T nuclei is comparable with 10X Chromium Single Cell 3’ Solution V3 in 3T3 and 293T cells, as well as VASA-drop in 293T cells (Fig. 2j). Unlike poly(A)-based 10X Chromium Single Cell 3′ Solution V3 with obvious 3′-end bias, both snRandom-seq and VASA-drop displayed no obvious 3′- or 5′-end bias across the gene body (Fig. 2k). As expected, snRandom-seq had a slight bias toward the 3′-end due to the extra addition of oligo(dT) primer in reverse transcription (Fig. 2k).
Performance of snRandom-seq in the FFPE tissues
For FFPE tissues, digestion with Proteinase K could isolate cleaner single nuclei than with collagenase (Supplementary Fig. 3a). With an optimized procedure (Fig. 1), single intact nuclei were efficiently isolated from multiple FFPE mouse tissues and a 2-year-old archived clinical FFPE sample of human liver cancer (Fig. 3a, Supplementary Fig. 4a), and the nuclei morphology and size distribution were comparable between FFPE and fresh samples (Supplementary Fig. 4b).
In our pilot FFPE snRNA sequencing experiment, little uniquely aligned reads were mapped to exons, with many reads mapped to intergenic regions due to genome contamination (Fig. 3b). Considering that the double-helix of DNA in FFPE tissues is liable to be disrupted after suffering chemical modification, a single-strand DNAs blocking step was added to the initial procedure of snRandom-seq (Fig. 1, box). The bare single-strand DNAs in the isolated FFPE single nucleus were blocked in situ by multiple annealing and extension of blocking primers on single-strand DNAs of genome. After DNA blocking, the percentage of intergenic regions was dramatically reduced (Fig. 3b). The mapping region distribution was comparable among DNA-blocked FFPE sample, fresh sample, and snFFPE-seq (10X Chromium Single Cell 3′ Solution V3), further supporting the high quality of the snRandom-seq data (Fig. 3b). By integrating the above procedures, high-quality cDNA libraries were generated by snRandom-seq from multiple FFPE tissues (Fig. 3c, Supplementary Fig. 4c, d). The fragment size of cDNA libraries from FFPE and fresh samples both peaked between 300 and 800 bps (Fig. 3c, Supplementary Fig. 4e).
To determine whether snRandom-seq can generate enough information from FFPE tissues as fresh samples, we collected both FFPE and fresh samples from the same mouse tissues and compared their RNA profiles using snRandom-seq (Fig. 3d). The RNA quality of FFPE and fresh samples were evaluated firstly by the RNA fragments distribution and DV200. As expected, the RNA quality of the FFPE sample was relatively poorer than that of the fresh sample (Supplementary Fig. 5a), suggesting that the RNA in the FFPE sample was degraded. The merged genome browser tracks of snRandom-seq results showed that the reads coverage areas of FFPE and fresh samples were similar (Supplementary Fig. 6a–g). Consistently, the total RNA profiles of FFPE and fresh samples by snRandom-seq displayed a good correlation (Pearson R: ~0.9, p < 2.2e-16; Fig. 3e, Supplementary Fig. 7a, b). Meanwhile, to prove the repeatability of our method, the same FFPE sample was sequenced independently with snRandom-seq (Fig. 3d), and a high correlation (Pearson R ~ 0.92, p < 2.2e-16) of gene expression profiles across these two batches was also seen (Fig. 3f). These results showed that snRandom-seq performed well in both fresh and FFPE samples.
We next compared our FFPE results with other reported FFPE snRNA-seq results. After data processing, thousands of true nuclei in these FFPE tissues were successfully identified from the snRandom-seq data (Fig. 3h, Supplementary Fig. 8a). snRandom-seq identified a broad spectrum of RNA biotypes in the FFPE sample (Fig. 3g), with about eight times as many lncRNAs as snFFPE-Seq, and snoRNA, scaRNAs, and miRNA were only detected in snRandom-seq (Supplementary Fig. 8b). The medians of genes detected per nuclei in unsaturated snRandom-seq datasets were all over 3000, significantly higher than that in other two reported high-throughput snRNA-seq methods for FFPE samples (snFFPE-Seq 10X Chromium Single Cell 3′ Solution V3: 276 genes/nucleus; snPATHO-Seq: 1850 genes/nucleus) (Fig. 3h), as well as the medians of UMIs (Supplementary Fig. 8c). Our data still has not yet to reach saturation point even sequencing ~300k mapped reads per nuclei and detecting ~10,000 genes (Fig. 3i).
We further compared the RNA coverage of snRandom-seq with the other two FFPE snRNA-seq methods. In the plot of average reads distribution on gene body, snFFPE-Seq using oligo(dT) primers showed a distinct 3′-end bias and 10X Chromium Fixed RNA Profiling using the same probe-base technology of snPATHO-seq showed a mild 5′-end bias (Fig. 3j). However, homogeneous distribution across gene body was observed in snRandom-seq data for the FFPE tissue (Fig. 3j), suggesting that random primers were evenly bound on transcripts and the extra oligo(dT) primers in snRandom-seq were invalid for FFPE sample. For RNA coverage at the level of single nucleus, snRandom-seq showed much higher coverage than that of snFFPE-seq or 10X Chromium Fixed RNA Profiling (Fig. 3k). For RNA coverage at the level of single gene, reads distribution along three selected genes (C1S, EMG1, KLRG1) indicated the critical difference between probe-based technology and the random primer-based strategy (Fig. 3l, Supplementary Fig. 8d). Mapped reads by 10X Chromium Fixed RNA Profiling were limited to the probe-target regions (<100 bp). In contrast, the mapped reads by snRandom-seq were evenly distributed in both exonic and intronic regions. These results suggested that snRandom-seq for FFPE tissues can capture a significant amount of high-quality RNA and extract much more transcriptomic information than the state-of-art platforms.
snRandom-seq revealed cell heterogeneity in FFPE mouse tissues
We next compared the cell types identified in FFPE and fresh samples by snRandom-seq. Unsupervised clustering of the above filtered high-quality single kidney nucleus profile revealed over ten distinct clusters. All clusters could be further annotated based on classical known cell-type markers24,25 (Fig. 4a, b, Supplementary Fig. 9a). Gene expressions of classical known cell-type marker genes22, such as Nphs1 for podocytes, Pecam1 for endothelial cells, and Pdgfrb for mesangial-like cells, were reliably mapped on the corresponding clusters (Fig. 4b). The mammalian renal tubule in the kidney contains at least 16 distinct epithelial cell types26. Here we identified most of the recommended terms for renal tubule epithelial cell types in FFPE mouse kidney samples by snRandom-seq, including proximal convoluted tubule, proximal straight tubule, distal nephron, distal convoluted tubule, loop of Henle, collecting duct principal cells, podocytes, proximal tubular cells, collecting duct intercalated cells, and collecting duct cells (Fig. 4a). Besides the known top markers of cell types, such as Slc14a2 for collecting duct cells, we also discovered several potential markers for these cell types (Fig. 4c). By merging the snRandom-seq data of the FFPE samples and fresh sample, as well as the other batch of FFPE samples, we obtained a robust cell clustering by t-SNE (t-distributed stochastic neighbor embedding) (Supplementary Fig. 9b). Most cell types were identified in the three snRandom-seq datasets (Fig. 4d, Supplementary Fig. 9c). As expected, there are some differences in the proportion of cell types of the FFPE and fresh samples (such as PTC), which might be caused by the sampling error and different nuclei extraction methods for FFPE and fresh samples.
We further added more FFPE mouse tissues to demonstrate the biological utility of snRandom-seq data. In total, we sequenced and analyzed 19,258 single nuclei from four FFPE mouse tissues (heart, kidney, testis, and liver) using snRandom-seq and identified a total of 25 cell types (such as hepatocyte, germ cells, fibroblast, cardiomyocyte, etc.). (Supplementary Fig. 10a, b). An underrepresentation of immune cells could be seen, which is consistent with previous findings about cell type composition by single-nucleus RNA-seq libraries27.
The large proportion of intronic sequences detected in FFPE samples (Fig. 3b) suggested that snRandom-seq data would be more suitable for RNA velocity analysis by distinguishing newly transcribed RNAs (unspliced) from mature RNAs (spliced)28. Next, we applied snRandom-seq to FFPE mouse testis, where spermatogenesis is an excellent model for studying cell dynamics. Consistently with other studies on fresh testis by scRNA-seq29,30, t-SNE arranged germ cells at transitionary stages (mainly early spermatocyte and late spermatocyte) to be in continuous succession. In contrast, undifferentiated spermatogonia and mature spermatids are in clusters (Fig. 4e). The velocities computed by detected nascent transcripts were visualized on the t-SNE plot, revealing distinct velocity vector directions in different cell types, especially in the cells located at the left of early and late spermatocytes (Fig. 4e, f). Combined with cell cycle states analysis based on gene expression, the RNA velocity revealed an obvious cell maturation trajectory on two subpopulations of late spermatocytes at the G2M phase with active transcriptional activity (Fig. 4g).
snRandom-seq discovered a proliferative subpopulation in the FFPE clinical human specimen
Finally, we applied snRandom-seq on an about two-year-old clinical FFPE specimen of human macrotrabecular-massive (MTM) hepatocellular carcinoma (HCC) subtype (Fig. 5a). We selected an interested tumorous area on the paraffin block according to the histopathological examinations (Fig. 5b) and performed snRandom-seq. snRandom-seq identified 5914 true nuclei and detected a median of 3220 genes and a median of 8182 UMIs per nucleus in this clinical FFPE specimen (Supplementary Fig. 11a, Fig. 5b). As sequencing depth increases, snRandom-seq detected about 8000 genes at saturation (Supplementary Fig. 11b). A broad spectrum of RNA biotypes including lncRNAs, snRNAs, miscRNAs, miRNAs, and snoRNAs was detected from the sample (Supplementary Fig. 11c). Unsupervised clustering of the human liver single nucleus revealed several distinct clusters. The main cell types of human liver could be identified from the human specimen based on the known cell-type markers31, including hepatocyte (APOA1), kupffer cells (CD163), T cells (CD3E), fibroblast (PDGFB), plasma cells (FCRL5) (Fig. 5d, Supplementary Fig. 11d). Notably, a subcluster of hepatocytes (hepatocyte-2) was separated from the main hepatocyte population, with high expression of the proliferative marker MKI67 and the other two markers (ASPM and TOP2A), which were reported to be related to HCC progression32,33. (Fig. 5e). Meanwhile, cell cycle analysis of these snRNAs revealed that most cells in the hepatocyte-2 cluster were in phase G2M (Fig. 5f), suggesting that the hepatocyte-2 cluster might be a group of dividing tumor cells. After further investigating the cell communication among the clusters (Fig. 5g), we found that hepatocyte-1 and hepatocyte-2 displayed different outcoming and incoming signaling patterns (Fig. 5h). Hepatocyte-2 mainly receives signals from plasma cells through the BMP signaling pathway (Supplementary Fig. 12a), which is reported to be correlated with tumor progression in HCC34,35. Ligand–receptor pair analysis found that plasma cells preferentially sent signals to hepatocyte-2 by BMP6-(ACVR1 + ACVR2A) and the communication between plasma cells and hepatocyte-2 has specific ligand-receptor pairs, including BMP6-(BMPR1B + BMPR2), BMP6-(BMPR1B + ACVR2B), BMP6-(BMPR1B + ACVR2A), BMP6-(BMPR1A + ACVR2A), and BMP6-(ACVR1 + ACVR2A) (Fig. 5i). The gene expression also showed that BMPR1B and ACVR2A have specific expressions in hepatocyte-2 (Supplementary Fig. 12b). Taken together, snRandom-seq discovered a proliferative and activated subpopulation of hepatocytes from a clinical FFPE specimen, which provides a valuable clue for additional study in future.
snRandom-seq was also performed on an FFPE specimen of human normal HCC subtype (Supplementary Fig. 13a). Based on the snRandom-seq data, sufficient gene count and UMI count were detected, and main cell clusters of the liver were identified (Supplementary Fig. 13b, c). Previous studies have indicated that lncRNAs exhibit tissue-specific expression36,37, which is always ignored in routine single-cell RNA-seq analysis due to their low expression. We found that hepatocyte clusters of normal HCC subtype had a markable expression of lncRNAs, including LINC02476 and LINC01151 in hepatocyte-2, LINC00540, LINC02307, and LINC02109 in hepatocyte-3, LINC02384 in hepatocyte-4 (Supplementary Fig. 13d). It has been reported that LINC02476 promotes the malignant phenotype of HCC by sponging miR-497 and increasing HMGA2 expression38, and LINC00540 influences human HCC progression and metastasis via the NKD2-dependent Wnt/β-Catenin Pathway39. These results suggested that hepatocyte-2 (expressed LINC02476) and hepatocyte-3 (expressed LINC00540) of normal HCC subtype might exhibit different pathogenesis. Taken together, snRandom-seq with the advantages of full-length transcripts coverage shows promise in lncRNA analysis in cancer biology.
We further performed an application of snRandom-seq on a matched pair of initial and relapsed FFPE clinical specimens from the same colorectal cancer liver metastasis (CRLM) patient. snRandom-seq detected medians of ~1000 gene counts and ~2000 UMI counts in both initial and relapsed FFPE specimens (Supplementary Fig. 14a). The cells from the initial and relapsed FFPE specimens were comprehensively integrated, and the major cell types (hepatocytes, cancer cells, T cells, fibroblasts, myeloid cells, endothelial cells, stellate cells, macrophages, cholangiocytes, B/plasma cells) were identified in both samples (Supplementary Fig. 14b, c). We observed that the proportion of T cells was higher in the relapsed FFPE sample (Supplementary Fig. 14d), suggesting a more active antitumor immune response in the relapsed sample. Consistently, the proportions of the dominating cancer clusters (cancer cells-1, −2, and −3) were decreased in the relapsed sample (Supplementary Fig. 14d). However, the proportion of cancer cells-4 was increased in the relapsed sample (Supplementary Fig. 14d). We further found that the genes encoding lipids composition regulator (SCD) and proteins binding lipids (APOA2, APOC3, and APOA1) displayed high expression levels in cancer cell-4 cluster in the relapsed sample (Supplementary Fig. 14e), suggesting an enhanced lipid metabolism in the cancer cells subcluster of the relapsed CRLM.
Read more here: Source link