Strain-level bacterial typing directly from patient samples using optical DNA mapping

Figure 1a provides an overview of the experimental procedure and it is summarized in the following sections.

Fig. 1: Schematic overview of high-resolution optical DNA mapping-based bacterial typing.
figure 1

a Experimental pipeline. The DNA, extracted via plug-lysis, is labelled with YOYO-1 and netropsin in a single step. The DNA is confined to nanofluidic channels and imaged using a fluorescence microscope, resulting in one experimental intensity profile per analysed DNA fragment. b Data analysis pipeline. Each experimental intensity profile is matched to the reference database of predicted intensity profiles. Next, the database matches are filtered for each experimental profile so that only high-scoring matches remain. Profiles are deemed to match discriminatively if all remaining matches are against a single strain group (SG) at the investigated resolution. Only the results of the discriminative profiles are reported back to the user. The result for the sample is shown as the number of discriminative fragments matching each detected SG and the proportions of all discriminative profiles matching each detected SG.

Bacterial samples

The bacterial samples analysed were selected by clinical relevance or from their phylogenetic relationship for method evaluation (Supplementary Data 1). Bacterial strains were grown in Luria-Bertani (LB) broth, with 1.5% agar for solid medium, and stored at −80 °C in 10% DMSO stocks. The non-cultivated urine samples were collected at Karolinska University Hospital in Stockholm and used directly for DNA extraction. For samples with mixes of bacteria, each strain was cultivated separately and mixed in equal amounts before DNA extraction. In total, 25 cultured E. coli samples (EC1–25), nine cultured K. pneumoniae samples (KP1–9), six uncultured clinical urine samples containing E. coli (U1–6), two mixes of four E. coli strains each (M1-2), and three E. coli samples with plasmid-borne blaCTX-M genes (P1-3) were selected for the study. The study did receive ethical approval (recordal number of the approval 2018/7273531/2) as per the Swedish Ethical Review Authority. The approval clarified that it was acceptable not to collect informed consent. We included only samples that were ready to be discarded and no additional sampling from patients was needed.

DNA isolation

DNA extraction was carried out by enclosing bacteria in gel plugs and isolating long DNA molecules18. In short, 250 μL of overnight culture or 1−3 mL of a non-cultivated urine sample was pelleted and enclosed in an agarose plug (100 μL). The bacteria were lysed, treated with RNase A and proteinase K and finally washed. The lysis protocol was carried out inside the agarose plug to protect the DNA from breaking. The agarose plugs (100 μL) were melted in 20 μL of 10× CutSmart Buffer (New England Biolabs) and 78 μL of MQ- water at 70 °C for 10 min, followed by incubation at 42 °C for 10 min before the addition of 2 μL of agarase (ThermoFisher Scientific, 0.5 U/L) and a second incubation at 42 °C for at least 1 h. The DNA concentration was determined using a Qubit Fluorometer 2.0 (ThermoFisher Scientific).

Sample preparation

The sequence-based intensity profiles for the ODM experiments were created by the addition of YOYO-1 (with an excitation of 491 nm and emission of 509 nm, Invitrogen) and netropsin (Sigma-Aldrich). 10 μL of an 0.5× TBE (Tris-Borate-EDTA, Novex, Invitrogen) solution was prepared with 1 μM (base pairs) extracted bacterial DNA, 1 μM (base pairs) λ-DNA (included as an internal size reference, 48,502 bp, Roche Biochem Reagents), 0.2 μM YOYO-1 (ratio of DNA base pairs/YOYO-1 is 10:1), and 60–120 μM netropsin (varying the ratio of netropsin:YOYO-1 between 600:1 and 300:1 optimized the emission intensity in each experiment but did not have any effect on the intensity profile matching discriminatively to the reference database), followed by incubation at 50 °C for 30 min. The DNA solution was diluted 10-fold with 88 μL of MQ-water and 2 μL of β-mercaptoethanol (used to prevent photodamaging, Sigma-Aldrich), obtaining a final buffer concentration of 0.05× TBE before loading 10 μL of the sample into a loading well of a nanofluidic chip.

The presence of antimicrobial resistance genes on plasmids in samples P1-P3 was detected using a CRISPR-Cas9 approach before DNA labelling13. TracrRNA and two different crRNA with sequences targeting the two main groups of the β-lactamase gene blaCTX-M, groups 1 and 9 (CCGCCGCGATGTGCTGGCTTCAG and AGAGAGCCGCCGCGATGTGCTGG, respectively) were purchased from GE Healthcare and re-suspended in 10 mM RNase free Tris- HCl buffer (Sigma-Aldrich). Guide RNA (gRNA) was created by incubating 0.33 nmol tracrRNA with 0.16 nmol of each crRNA in 1× NEBuffer 3 (New England Biolabs) and 1× (0.1 μg/μL) BSA (New England Biolabs) for 30 min at 4 °C. Next, 10 μM (0.15 nmol) of gRNA was incubated with 300 ng of Cas9 (Sigma-Aldrich), 1× NEBuffer 3 and 1× (0.1 μg/μL) BSA, at 37 °C for 15 min. Finally, 60 ng of extracted DNA from the bacterial samples was added to the mixture, followed by incubation at 37 °C for 1 h. Following the Cas9 restriction, the DNA was labelled for ODM experiments according to the slightly modified protocol 4 μM DNA (base pairs, directly from the Cas9 mixture, without purification), 1.2 μM (base pairs) λ-DNA, 2.6 μM YOYO-1 (ratio of DNA base pairs:YOYO-1 2:1), and 260 μM netropsin (netropsin:YOYO-1 ratio 100:1) followed by incubation at 50 °C for 30 min. The DNA solution was diluted 10-fold before loading 10 uL of the sample into a loading well of a nanofluidic chip.

Nanofluidic experiments

The nanofluidic experiments were done using a nanofluidic chip with 200 parallel nanochannels spanning 500 µm in length, each with a dimension of 100 × 150 nm2 (height x width). The nanochannels are connected to a microchannel on each side, connecting two loading wells each. The chips were fabricated in silicon dioxide using standard micro-nanofabrication methods19. When recording the intensity profile, each single DNA fragment is introduced via one of the microchannels to a nanochannel using pressure-driven N2 gas flow, forcing the molecule to stretch out. When the DNA fragment is stretched out in the nanochannel, the DNA was imaged using a Zeiss AxioObserver.Z1 fluorescence microscope with a 1.6x optovar, 63x oil immersion objective (NA = 1.46 Zeiss) and an Andor iXon EMCCD camera. For each DNA fragment at least 20 timeframes with an exposure time of 200 ms were recorded. If the stretch factor, calculated from the reference molecule λ-DNA, was less than 0.2 nm/bp the data was discarded.

Data analysis

An overview of the principle of the data analysis can be seen in Fig. 1b. For each recorded DNA fragment, an experimental intensity profile was generated from at least 20 timeframes via time-averaged kymographs. The generation of time-averaged kymographs is explained in detail in the Supporting Data of a previous study20. Next, all experimental intensity profiles were compared to a reference database of in silico predicted profiles that was constructed in the following way. To obtain high-quality genome assemblies, all complete bacterial genomes were downloaded from NCBI’s RefSeq database (2019-07-17). Since our reference database was developed for species and strain identification, we aimed to include only complete chromosomal sequences and, therefore, removed sequences shorter than 500 kb or with the words “plasmid” in the FASTA header. Similarly, sequences with dubious taxonomic classification – i.e. with “Candidatus” or “bacterium” in the FASTA header – were also removed. This left 13,024, most likely complete and chromosomal, sequences with a FASTA header containing a full bacterial species name (2697 different species). For a full list of included accession numbers, see Supplementary Data 2. All sequences were converted into predicted intensity profiles by calculating the binding probabilities of YOYO-1 along the DNA molecule in the presence of Netropsin, and adding camera effects to mimic the effects of imaging in a nanochannel using a fluorescence microscope20,21,22.

Of the reference genome sequences, 867 belonged to the Escherichia coli/Shigella group and 348 to Klebsiella pneumoniae. Due to their genetic similarity on the chromosomal level, E. coli and Shigella spp. were treated as a single species in this study23. These references were assigned to strain groups (SGs), i.e. taxonomic groups at a higher resolution than species, in the following way. Firstly, the sequences were annotated using Prokka (version 1.12, default parameters) and the resulting gff-files were used as input to Roary (version 3.13.0, parameters –e –p 20) to identify the core genomes for each of the species24,25. Since the E. coli/Shigella group is a diverse group of bacteria, we used additional parameters (–g 100000 –cd 95) to avoid a very small core genome. In E. coli/Shigella, 1697 genes were classified as core genes (present in 95% of the analysed genomes) and in K. pneumoniae, there were 2 804 core genes (present in 99% of the analysed genomes). Phylogenetic trees were constructed using the core genome alignments from Roary with FastTree (version 2.1.11, double precision) and the trees were visualized using iTol with midpoint rooting26,27. An R-script was used to read the Newick tree files (package phytools version 0.6–99) and divide the trees into SGs based on branch lengths (package dendextend version 1.13.4; parameter h = 0.03/0.015/0.008/0.003 for E. coli SGLow/SGMedium/SGHigh/ SGUltra-High and h = 0.008/0.007/0.003 for K. pneumoniae)28,29. See Supplementary Figure 1 and Supplementary Fig. 2 for illustrations of the trees and SG schemes. For a given threshold branch length (parameter h), each reference genome of E. coli/Shigella and K. pneumoniae was assigned to a single SG. A short branch length created many SGs and, consequently, an SG scheme of a high taxonomic resolution and vice versa. For the selected resolutions, the 867 E. coli reference genomes in the database were divided into 88 (SGHigh), 34 (SGMedium), and 10 (SGLow) SGs, respectively. At SGHigh, all reference genomes of the globally disseminated E. coli sequence type (ST) 131 formed their own SG. Analogously for K. pneumoniae, the 348 reference genomes were divided into 62 (SGHigh), 22 (SGMedium), and 11 (SGLow) groups, respectively. Here, the clinically important clonal complexes CC14, CC147, CC258, and CC307 all formed distinct SGs at SGHigh, except for the SG containing CC258, which also contained a single additional genome (Supplementary Data 3 and Supplementary Data 4). In addition, all E. coli/Shigella and K. pneumoniae sequences were typed using MLST30,31,32 and the E. coli phylogroups were determined using EZClermont33. For a full list of all E. coli/Shigella and K. pneumoniae reference genomes, their SG assignment for the three investigated SG resolutions, their sequence type (ST), and, for E. coli, their phylogroup, see Supplementary Data 3.

The comparison between the experimental intensity profiles and the reference database of predicted profiles20 results in a similarity score – based on the Pearson correlation coefficient – for the optimal alignment of each experimental profile against each reference. The reference is stretched to the experimentally measured nanometer per base pair stretch factor determined using the reference molecule λ-DNA, the alignment allows for fluctuations of the DNA molecule (with re-scaling factors 90–110% in steps of 2.5%), and the experimental profile is slid along the reference (in both directions) to identify the optimal alignment of the profile pair. The scores are used to identify experimental profiles that are informative on the taxonomic level of interest, i.e. they match discriminatively to a single taxonomic group. We have previously shown that this method is effective in typing bacterial samples to the species level18, and we here apply the same principle to varying SG resolutions. Firstly, all intensity profiles that are too short are removed (rescaled length <Lmin). For each remaining experimental profile, the best matches – defined as the highest-scoring match and any match within Cdiff of the highest score – are kept. If the highest score for the profile is above a threshold, Cthresh, and if all the best matches are to a single SG, then the profile is deemed to match discriminatively to that SG. Only the results of the discriminative profiles are reported back to the user. There are thus four parameters that affect the results: the minimum allowed profile length (Lmin, rescaled length), the width of the score range of the best matches for a profile (Cdiff), the minimum allowed maximum score of a profile (Cthresh), and the taxonomic resolution of interest. The parameter Lmin was set to 250 pixels and Cthresh was set to 0.5 throughout this study based on the results of our previous work18. The effect of the remaining two parameters was tested in terms of the proportion of all profiles that matched any SG discriminatively and the sensitivity measured as the true positive rate (TPR) – i.e., the proportion of the discriminative profiles that matched the correct SG. The correct SG was determined as follows. All samples were included in a phylogenetic tree together with all reference genomes of the same species using the same methods and parameters as described above. For each sample, the SG assignment of the closest reference genome in the tree was identified as the correct one.

Analysis of potential plasmid fragments was performed using an automated, custom-written Matlab-based program (see Data Availability)13,18. In short, the extensions of the fragments were extracted from the time-averaged kymographs and the fragments were first grouped based on their length. For a given size group, the associated intensity profiles were aligned and clustered based on similarity using a correlation coefficient for their alignment to one another (exemplified in Supplementary Fig. 3). The presence of the target gene was verified by analysing whether most double-strand breaks occurred at the same position in a similarity cluster (typical correlation coefficient threshold of 0.7). This indicates that the cuts are caused by Cas9 cuts at the target gene location, as opposed to being caused by mechanical forces or photonicking (exemplified in Supplementary Fig. 3).

Whole-genome sequencing of selected samples

Whole-genome sequences of the samples were used to assign each sample to its correct SG for each of the tested SG resolutions to confirm the ODM data. The E. coli samples EC3–4, EC7-11, and EC18 had been previously sequenced by Fröding et al.34 (assembly accession numbers GCA_013171465.1, GCA_013172225.1, GCA_013172765.1, GCA_013174205.1, GCA_013176495.1, GCA_013176595.1, GCA_013176845.1, and GCA_013176885.1) and the K. pneumoniae samples KP2–3, KP7, and KP9 by Koskinen et al.35 (GCA_016056235.1, GCA_015831445.1, GCA_016056215.1, and GCA_015831545.1). These assemblies were all downloaded from GenBank. The samples KP1, KP4-6, and KP8 were sequenced and assembled using the same method as the other K. pneumoniae samples35. The remaining samples were sequenced for this study in the following way. For plasmid sample P1, the DNA extraction was done using the Qiagen EZ1 DNA Tissue Kit and EZ1 Advanced XL instrument followed by paired-end sequencing with a read length of 300 bp on an Illumina NovaSeq using the Nextera library prep kit. For samples Mix2.1, Mix2.2, EC20, EC22, and EC24 the DNA was extracted using the robot PSS magLEAD 12gC with the Magtration reagent MagDEA Dx SV reagent kit. Sequencing was performed on an Ion Torrent PGM System for samples Mix2.1, Mix2.2, and EC22 (400 bp, no size selection) and an Ion Torrent Ion S5 XL system for samples EC20 and EC24 (300 bp, autosize selection) using the Ion Xpress Plus Fragment Library Kit for AB Library Builder System. The rest of the samples – E. coli samples EC1–2, EC5–6, EC12–17, EC19, EC21, EC23, EC25, Mix1.1, all urine samples (U1–6), and plasmid samples P2–3 – were extracted for total DNA using the Illumina Epicenter MasterPureTM DNA purification kit according to the manufacturer’s instructions and sequencing was performed on an Illumina MiSeq using the Nextera XT library prep kit and sequenced with a paired-end read length of 300 bp. The E. coli samples EC1–2, EC5–6, EC12, EC17, EC19–25, and all urine samples (U1–U6) were analyzed with CLC Genomics Workbench (Qiagen, v21) by de novo assembly. Multilocus sequence typing was performed from de novo contigs using the CLC Microbial Genomics module. The sequencing data for E. coli samples EC13–EC16 and all plasmid samples (P1–3) were first trimmed for adaptors and low-quality ends using Trim Galore! (www.bioinformatics.babraham.ac.uk/projects/trim_galore/; v. 0.4.3, parameters –stringency 3 –paired –retain_unpaired) and then assembled using SPAdes (v. 3.14.1, parameters –isolate –cov-cutoff 5).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Read more here: Source link