Stochastic Amplicon Ligation. DNA samples for oncology sequencing are typically extracted from FFPE tissues and can have average lengths of less than 500 nt due to accumulated chemical damage . We developed the Stochastic Amplicon Ligation (SAL) method to enzymatically concatenate many short DNA molecules together to utilize the long-read capability of Nanopore Sequencing and increase the effective throughput.
SAL is based on the Golden Gate assembly method used in synthetic biology to concatenate short oligos into synthetic genes . In SAL (Fig. 2 b), amplicons are appended with engineered adapter sequences that possess a Type IIS restriction enzyme recognition site. After Type IIS cleavage, a 4 nt sticky end is left on the 5 ′ ends of both strands of the amplicons; these sticky ends allow the amplicons to transiently bind to each other, which then enzymatically ligate to form concatemers. Multiple cycles of enzymatic restriction and ligation are performed to increase the lengths of the concatemers, and we perform a SPRI (solid phase reversible immobilization) size selection afterwards to both enrich long concatemers and remove short recognition site oligos cleaved from the amplicons. The multiple temperature cycles between 37 and 16 ∘C improve the mean lengths of the concatemer assemblies by keeping the concentrations of activated monomer amplicons low. Prior literature  suggests that direct ligation of amplicons with 5 ′ sticky ends results in a much larger population of shorter concatemers.
SAL differs from traditional Golden Gate assembly in having universal sticky end sequences to allow stochastic incorporation of any amplicon with the appropriate adapters. This, in principle, allows unlimited growth of concatemers to longer lengths given sufficient monomer concentration and enough temperature cycles and reduces the possibility of unintended reactions due to nonspecific binding between non-cognate sticky ends. Experimentally, capillary electrophoresis indicated that SAL concatenated a mean of roughly 12 to 15 monomers per concatemer (Fig. 2 c). In addition to improving the throughput of Nanopore Sequencing, we also found that the SAL improved the quality of Nanopore Sequencing reads (Fig. 2 d). The Nanopore Sequencing results of a 340-nt amplicon had a mean phred quality score of 9.87, corresponding to an error rate of 10.3%. The concatemer, in contrast, had a mean phred score of 11.55, corresponding to an error rate of 7.0%. The lower quality score of shorter reads is due to lack of sufficient current signal information for proper normalization prior to basecalling by MinKNOW (Personal communication by ONT technical support).
The improved throughput and accuracy of Nanopore Sequencing of SAL concatemers allow calling somatic single-base mutations at 5% VAF when matched normal sample are available (Fig. 2 e). We first applied Nanopore Sequencing to amplicons from a 95%:5% mixture of the NA18537 and NA18562 human cell line genomic DNA. NA18537 and NA18562 were homozygous for different alleles at the rs3789806 and rs9648696 single nucleotide polymorphism (SNP) loci, so the mixture was 5% VAF in the NA18537 SNP alleles. In our analysis, it was difficult to call somatic mutations at 5% VAF, since there was a large number of loci on the amplicon with high variant read frequencies. When we subtracted the variant read fraction (VRF) from a 100% NA18537 sample, then the 5% VAF somatic mutations became more visible in the ΔVRF. In direct amplicon nanopore sequencing, the two 5% VAF SNP alleles were detected in the ΔVRF figure with +6.5 σ and +4.4 σ, respectively. The confidence of calling these 5% VAF variants were increased to +17.4 σ and +13.3 σ for SAL concatemers. For SAL concatemers, the long nanopore sequencing reads were bioinformatically deconcatenated using a custom python code .
The lower throughput and quality score of shorter DNA libraries are specific to SQK-LSK109 sample prep chemistry on R9.4.1 flow cells using MinKNOW 19.12.5 for basecalling. Sequencing throughput and quality of short libraries could be on par with long DNA libraries using ONT’s latest and upcoming improvements to sequencing chemistry and basecalling algorithms .
Rolling circle amplification (RCA) is an alternative method for generating long DNA from shorter DNA molecules, which has been used in the context of Nanopore Sequencing for improving accuracy . RCA has several significant limitations compared to SAL, most notably that RCA requires an initial circularization of DNA which is known to have low efficiency, reducing the clinical sensitivity due to low conversion yield of biological DNA molecules in sequencing library. Additionally, RCA produces concatemers in which the segment sequences all reflect the sequence of the original molecule, rather than a uniform sampling of all DNA molecules on the loci of interest. Finally, RCA generates single-stranded DNA products rather than double-stranded DNA products, which should be converted into double-stranded DNA for efficient Nanopore Sequencing.
Integrating BDA allele enrichment. Frequently, matched normal FFPE tissue samples will not be available, so using SAL alone for Nanopore Sequencing detection of low VAF somatic mutations is unlikely to be impactful clinically. The OCEANS method employs blocker displacement amplification (BDA) [16, 17] to allow more robust detection of low VAF somatic mutations without requiring a matched normal sample. In brief, BDA includes a wildtype-binding blocker oligonucleotide that competes with a PCR primer in hybridizing to DNA templates of interest. The blocker binds more strongly than the primer to wildtype DNA sequences, preventing efficient PCR amplification. On DNA templates with sequence variants, the primer outcompetes the blocker and PCR proceeds as usual. Through the course of many PCR cycles (20–25 cycles), the VAF of sequence variants (including single nucleotide mutations) can be enriched by over 1000-fold.
In OCEANS, the DNA biospecimen is first mixed with multiple primers and blockers to undergo variant-selective PCR amplification (Fig. 3 a). The amplicons will over-represent sequence variants in genetic loci of interest, though some amplicons with wildtype sequences will still exist. The amplicons are subsequently appended with SAL adapters and concatenated into concatemers, and size-selected to remove short assemblies, primers, etc. The concatemers are then ligated to the standard Oxford Nanopore Sequencing adapters with attached motor proteins and loaded into the nanopore sequencing flow cell. The entire workflow takes about 10 h, including post-sequencing bioinformatics. On the same SNP alleles as in Fig. 2, the OCEANS results showed VRFs that were dramatically higher than the sample variant VAFs (Fig. 3 b), with 0.1% VAF SNP allele enriched to over 70% VRF. Thus, OCEANS allows robust variant calls of somatic mutations without the need for a matched normal DNA sample, which was not possible previously on the Nanopore Sequencing platform.
We next constructed two multiplexed OCEANS panels: a 7-amplicon panel covering recurrent mutations observed in acute myeloid leukemia (AML) and a 15-amplicon panel covering recurrent mutations observed in melanoma. The AML panel covers roughly 254 mutations in the COSMIC database across 7 genes (Fig. 4 a), and the melanoma panel covers roughly 370 mutations across 8 genes (Fig. 4 d). We first characterized the limit of detection for mutations covered by these OCEANS panels using synthetic spike-in reference samples, with VAFs ranging from 0.05 and 1%.
Variant calls were made using two different approaches: (1) based on the variant read frequency exceeding a threshold of 20% and (2) based on a Clair  score of above 180 (Personal communication from ONT). We found that both approaches were imperfect: considering VRF alone ignores the fact that Nanopore Sequencing has different error rates for certain sequences, e.g., homopolymers. Clair scores, on the other hand, are not monotonic with VAF and have been observed to be less accurate for indel calls [10, 25–27]. To ensure minimal false positives in variant calls, we require that a variant must be independently called by both methods in order to be reported. On our internal reference samples (Fig. 4 a, b, d), we observed VAF limits of detection between 0.05 and 1%. We did not observe any effect on enrichment of variants by excluding SAL from OCEANS workflow (Additional file 1: Figure S11). But SAL significantly increases the throughput for short amplicons by harnessing the long read capabilities of Nanopore Sequencing. We next applied our OCEANS panels to third-party reference samples from Horizon Discovery, with mutation VAFs at 5% (for AML) and 0.5% (for melanoma). The expected mutations were all called.
Interestingly, we also made a number of unexpected variant calls in the melanoma OCEANS panel (Fig. 4 e). The variant in the MAP2K2-207 amplicon was confirmed to be a non-pathogenic SNP (rs10250). The variants in the PIK3CA-542 amplicon were found to be aligned to the PIK3CA pseudogene (LOC100422375) , which were also preferentially enriched by BDA. The “variants” associated with the PIK3CA pseudogene were bioinformatically excluded from variant calls in subsequent experiments. Finally, we made confident variant calls for PIK3CA c.3140A >G and KRAS c.38G >A. We contacted Horizon Discovery customer support regarding these putative mutations, and the latter confirmed that these mutations are also present at low VAFs in the HD238 sample.
Validating OCEANS on clinical tissue samples. We next applied the melanoma OCEANS panel to clinical melanoma tissue samples, including both fresh/frozen (FF) and FFPE tissue (Fig. 5 a, b). As in the calibration experiments, we called somatic mutations only when the VRF was observed to be greater than 20%, and the Clair score was above 180. In total, DNA from 7 FF and 18 FFPE tissue samples were sequenced using both OCEANS and NGS. The melanoma OCEANS panels cover a total of 384 loci, corresponding to a total of 9600 total loci analyzed across the 25 samples.
Figure 5 c shows the comparison between OCEANS and NGS. All 16 somatic mutants called by NGS at above 5% VAF were also called by OCEANS, corresponding to a 100% OCEANS sensitivity relative to NGS. Of the 9584 NGS-negative loci, OCEANs called an additional 97 variants (Fig. 5 c); thus, relative to NGS, OCEANS had a 99.0% specificity.
We calculated the original sample VAF from OCEANS VRF using fold enrichment calculated for mutations in calibration experiments (Additional file 1: Section S5) . The sequencing error rates combined with the saturation of VRFs near 100% after BDA enrichment means that our quantitation dynamic range is relatively small. However, estimation of sample VAF enables identification of high VAF (>5%) mutations to aid in making treatment decisions based on clinical diagnosis. OCEANS identified all mutations with NGS VRF >5% as high VAF mutations (Fig. 5 d). The OCEANS VRF and NGS VRF cannot be directly compared, since OCEANS VRF is not the true sample VAF. Therefore, precision-recall values were calculated from OCEANS calculated VAF and NGS VRF comparison in Fig. 5 d. By varying the OCEANS calculated VAF cutoff threshold, we can change the number of high VAF mutation calls that are verified by NGS as >5%, generating a set of precision/recall tradeoffs for detecting mutations with NGS VRF >5%, which can be plotted as a precision-recall curve (Fig. 5 e). Importantly, we believe that many of the 97 discordant called variants that were below the 5% NGS VRF cutoff could be real mutations, based on our calibration experiments. To confirm our discordant OCEANS mutation calls, we further performed digital PCR on 6 FFPE samples at 4 mutation loci (BRAF p. V600, KRAS p. G13D, KRAS p. E62K, and MAP2K1 p. P124L) and one fresh frozen sample for BRAF p. V600 loci (Supplementary excel table). Of these 25, 12 mutations were called positive by OCEANS and 13 were called negative by OCEANS. OCEANS was concordant with ddPCR for 11 positive samples and 11 negative samples (Table 1, Additional file 1: Section S6). Identification of such low VAF mutations makes OCEANS suitable for applications like Minimum Residual Disease (MRD) detection.
It is important to note that concordant positives between OCEANS and ddPCR indicate the existence of a DNA variant in the sample, which may not necessarily reflect a mutation in the patient. Cytosine deamination is a well-documented type of DNA damage frequently observed in DNA extracted from FFPE. We applied an FFPE damage repair kit to the FFPE DNA before performing OCEANS library preparation, but do not necessarily expect that all cytosine deaminations are repaired or excised. In particular, any repair kit based on cleaving/repairing uracils formed through the deamination of standard cytosine would not be able to rectify deamination of methylcytosines into thymines.
Because each ddPCR mutation requires a separate reaction, the ddPCR results required 4 times more input DNA than OCEANS just to cover these 4 mutations. For analysis of clinical biopsy samples, OCEANS would have significantly higher clinical sensitivity due to being able to analyze all mutations in the panel from a single sample.
Next, we wished to characterize the reproducibility and robustness of the OCEANS panel on different types of Nanopore Sequencing instruments and flow cells. The Oxford Nanopore Flongle flow cell, in particular, is relatively inexpensive at <DOLLAR/>70 and can further reduce turnaround time relative to MinION by reducing the need for sample batching before sequencing. We performed the OCEANS panel on all 25 melanoma samples on the Flongle, and observed highly quantitatively similar VRFs as our results on the MinION (Fig. 5 f).
NSCLC and HCC OCEANS panels. We next constructed two additional OCEANS panels: a 28-amplicon panel for non-small cell lung cancer (NSCLC) and an 11-amplicon panel for hepatocarcinoma (HCC) to show the generality of our approach. The NSCLC OCEANS panel covers roughly 1121 mutations in the COSMIC database across 13 genes (AKT1, ALK, BRAF, DDR2, EGFR, KRAS, NRAS, MAP2K1, MET, PIK3CA, PTEN, ROS1, and TP53, see Additional file 1: Section S4). DNA from 5 FF and 18 FFPE NSCLC tissue samples were sequenced using both OCEANS and NGS. Figure 6 a, c show the comparison between OCEANS and NGS. Nine out of 11 somatic mutants called by NGS at above 5% VAF were also called by OCEANS. The two mutations that had a Clair score less than 180 were indel mutations, for which Clair has been observed to be less accurate [10, 25].
The HCC OCEANS panel covers roughly 680 mutations across 7 genes (CTNNB1, ARID1A, AXIN, TERT, JAK1, PTEN, and TP53, see Additional file 1: Section S4). DNA from 5 FF and 16 FFPE HCC tissue samples were sequenced using both OCEANS and NGS. Figure 6 b, d show the comparison between OCEANS and NGS. Fourteen out of 17 somatic mutants called by NGS at above 5% VAF were also called by OCEANS. The 3 mutations not called by Clair were in the TERT amplicon within a homopolymer region (Additional filer~efMOESM1: Section S5). Higher Nanopore Sequencing error rates in homopolymer regions could be the reason for lower Clair scores despite the OCEANS VRF being >70% for these mutations. We observed 23 loci with OCEANS VRF greater than or equal to 20% for which the corresponding NGS VRF were 0%. We analyzed the NGS read depth for all loci with NGS VRF equal to 0% and their corresponding OCEANS VRF (Fig. 6 g). All 23 loci with OCEANS VRF greater than or equal to 20% had NGS read depth of less than 300. Overall, both OCEANS panels had high concordance between OCEANS and NGS. The area under the precision-recall curve was 92.34% for the NSCLC panel and 90.74% for the HCC panel based on OCEANS calculate VAF (Fig. 6 e, f).
Read more here: Source link