Yersinia pestis genomes reveal plague in Britain 4000 years ago

All radiocarbon dates were calibrated in OxCal 4.4 using the IntCal20 calibration curve18,19. There is no stable carbon and nitrogen isotopic evidence for any detectable input of marine or freshwater foods that would require a correction for reservoir effects.

Charterhouse Warren: Archaeological context

Charterhouse Warren is a natural shaft in the limestone of the Mendip Hills, Somerset. Initial excavations in the 1970s were instigated in order to determine whether the shaft led to a cave system, as no archaeological remains from the site were known at that time. Excavators noted the presence of human and animal bones in a disarticulated state, some of which showed cutmarks and rodent gnawing43. The assemblage of human remains sampled for aDNA was encountered in a distinct layer at ca. 15 m depth, together with faunal remains and a small number of artefacts, including sherds of a Beaker vessel. At least 40 men, women and children are represented by fragmentary and disarticulated remains. Culturally and chronologically, the assemblage dates to the late Beaker period/Early Bronze Age, ca. 4150–3950 cal BP. This is a non-normative burial context for this period, which is dominated by single articulated burials associated with funerary monuments and cemeteries. Modifications found on a significant proportion of bones suggest that many, if not all, of these individuals had been subject to fatal perimortem trauma and subsequent processing before disarticulated bones were deposited together in the shaft in what is likely to have been a single event, though analysis of the assemblage and modelling of the radiocarbon dates is ongoing. While disarticulated remains are sometimes encountered in Bronze Age Britain, the scale of deposition at Charterhouse Warren is unique in a British context, suggesting that a very unusual event is represented. The evidence for blunt force perimortem trauma to a number of crania together with evidence for dismemberment suggests that this may relate to an episode of extreme violence.

Levens Park: Archaeological context

The Levens Park ring cairn, consisting of a low kerbed mound below which were further rings of boulders or wall-like structures, in Levens, Cumbria, UK, was excavated between 1968 and 1971 by David Sturdy21. Examination of existing archival material by members of Levens Local History Group has produced a recent reassessment of the excavation, resulting in a more detailed understanding of the monument and its purpose22. We provide a general narrative of the site here but see Clare et al.22 for details on associated uncertainties. Four skeletons (Burials 1, 2, 3 and 4) were recovered from the monument and represent two separate phases of burial activity. Burial 4 comprised an unaccompanied crouched burial of a 25–45-year-old male covered with a large boulder but no associated artefacts. The skeleton has been radiocarbon dated to 4229-3976 cal BP (95% confidence, 3731 ± 34 BP, GU-51283), significantly earlier than dates from the other three skeletons (R_Combine χ2 Test in OxCal 4.4: df = 3, T = 11.2 (5%, 11.8))18,19. Burials 1, 2 and 3 were recovered from a larger adjacent structure within the monument. Their radiocarbon dates were statistically indistinguishable, suggesting they could have died around the same time or within a few decades of one another (R_Combine χ2 Test in OxCal 4.4: df = 2, T = 0.7 (5%, 6.0));18,19 Burial 2, which is thought to be the primary interment in this phase, owing to its position in the centre of the new structure, was a 35–45-year-old female recovered in a crouched position from a plank-lined grave (possibly a wooden coffin) and accompanied by sherds of Beaker pottery. The skeleton has been directly dated to 4065–3780 cal BP (95% confidence, 3602 ± 33 BP, GU-51281). Burial 1, a 17–35-year-old female dating to 4080–3840 cal BP, 95% confidence, 3626 ± 33 BP, GU-51280), and Burial 3, a probable adult female dating to 3982–3879 cal BP (95% confidence, 3587 ± 33 BP, GU-51282), were recovered from the cairn matrix and their original character and associations are unclear. Strontium and oxygen stable isotope analyses of Burials 1, 3 and 4 are consistent with them having grown up locally, although the results would also be consistent with an origin in other parts of Britain, likewise Burial 2.

Sampling, DNA extraction and library preparation

Samples were processed at The Francis Crick Institute in a dedicated cleanroom facility. We used an EV410-230 EMAX Evolution Dentistry drill to clean the surface of the teeth and sampled both the cementum and multiple fractions of the dentine, resulting in 7–25 mg of powder from the dentine. Dentine powders were then lysed with 300 µl (<10 mg of powder), 600 µl (10–25 mg) or 1000 µl (>25 mg of powder) of lysis buffer (0.5 EDTA pH 8.0, 0.05% Tween-20, 0.25 mg/ml Proteinase K44) and incubated overnight at 37 °C. Lysates were centrifuged for 2 min at maximum speed of 16,400 × g (13,200 rpm) in a table centrifuge and 140 µl of the lysate was transferred into FluidX tubes for automated extraction on an Agilent Bravo Workstation45. Extracts were turned into single-stranded DNA libraries24, then double-indexed46 and underwent paired-end sequencing on an Illumina HiSeq4000 or NextSeq500 platform resulting in 1.8 to 7.3 million read-pairs per sample for initial screening. All samples were processed alongside negative lysate and extraction controls as well as positive and negative library controls.

Targeted enrichment

Following initial pathogen detection, libraries were taken forward for target enrichment using Yersinia pestis baits predesigned by myBaits Arbor Biosciences, following myBaits custom RNA seq v5.1 (March 2021) High Sensitivity protocol47. We used 7 μl of the initial library for two rounds of hybridisation at 55 °C with 23-h overnight incubation and 20 PCR cycles, followed by a heteroduplex removal using a one-cycle PCR and MinElute Purification (Qiagen) cleanup. We sequenced each library with a 2 × 100 paired-end read configuration on the Illumina MiSeq and NovaSeq platforms (Table 1).

Size selection for shotgun sequencing

Before larger-scale direct shotgun sequencing, fragments shorter than 35 bp and longer than 150 bp were removed from the libraries, as in Gansauge et al.24. Specifically, 100 ng of the initial library was biotinylated and streptavidin beads were used to isolate the non-biotinylated strand and obtain a single-stranded library. These samples (pooled with 3 others) were then loaded on a denaturing polyacrylamide gel along with 35 bp and 150 bp insert markers, and fragments within the desired sequence length were physically excised and eluted from the gel, after overnight incubation. The resulting size-selected libraries were further amplified and sequenced on the Illumina NovaSeq (Table 1).

Bioinformatic processing

Samples were processed via the nf-core/eager v2 pipeline48. First, adapters were removed, paired-end reads were merged and bases with a quality below 20 were trimmed using AdapterRemoval v249 with –trimns –trimqualities –collapse –minadapteroverlap 1 and –preserve5p. Merged reads with a minimum length of 35 bp were mapped to the hs37d5 human reference genome with Burrows-Wheeler Aligner (BWA-0.7.17 aln)50 using the following parameters “-l 16500 -n 0.01”51,52.

Metagenomic screening and authentication

We analysed sequences that did not align successfully to the human genome using Kraken 217 and identified individuals as putatively positive for Yersinia pestis by assessing the number of observed k-mers (sequence matches) to Yersinia pestis compared to Yersinia pseudotuberculosis. These libraries were subsequently aligned to the CO92 Yersinia pestis reference genome (NC_003143.1) and the CO92 plasmids, pMT1 (NC_003134.1), pCD1 (NC_003131.1) and pPCP1 (NC_003132.1) using the BWA aln50 parameters “-l 16500 -n 0.01 -o 2”. Duplicates were removed by keeping only the first sequence out of any set of sequences with the same start position and length (github.com/pontussk/samremovedup). We assessed the authenticity of the final set of sequences using the following criteria:25 the observation of postmortem damage, with the number of sequences being negatively correlated with edit distance from the reference genome, and a unimodal fragment length distribution via DamageProfiler53. Additionally, we used SAMTools v1.3.1 depth54 to confirm an even breadth of coverage across the reference genome. Screening libraries that passed these authentication criteria, were taken forward for further shotgun sequencing and target enrichment. For the final BAM files, we merged shotgun and target enriched BAM files using samtools merge resulting in a final coverage of 8.4x, 6.1x and 3.3x coverage for C10091, C10098 and C10928, respectively when filtered on a minimum mapping quality of q1 (MQ1) (Table 1, Fig. 2A, Supplementary Fig. 1).

Analysis of functional elements and virulence factors

We used previously identified virulence genes11 on the Yersinia pestis chromosome, pCD1, pMT1 and pPCP1 plasmids, and analysed the coverage of these genes with BEDTools v2.29.255 (Fig. 2A). The ggplot256 package in R was then used to make a coloured heatmap of the percentage of positions in each virulence gene covered by at least one sequence. Additionally, we used BEDTools v2.29.255 genomecov to assess the missingness across the genome and identify regions that were larger than 1 kb in the two Charterhouse individuals. Some differences in coverage across the Yersinia pestis chromosome between the pre-designed Arbor RNA capture baits compared to shotgun data can be observed in direct comparisons, most likely due to preferential capture of some parts of the chromosome over others with the bait set (Supplementary Fig. 3). However, we see no evidence that this could lead to conclusions about major deletions or loss of function. We manually inspected the ureD, pde-2 and flhD genes to identify the presence of indels and SNPs specific to these genes, to assess whether they were functional or not in the Charterhouse Warren genomes9. However, the coverage for C10928 was too low to assess these indels.

Mismatches between the two Charterhouse genomes

We compared the two Charterhouse Warren genomes to identify transversion SNPs at differing base-fold coverages (Supplementary Fig. 2, Supplementary Table 2) after filtering for a mapping and base quality of 30 using samtools calmd54 and removing heterozygous sites using an in-house script (github.com/pontussk/mpileup_mismatch_pathogen.py).

Phylogenetic reconstruction

Previously published data from Bronze age and Neolithic plague genomes and the IP32881 strain of Yersinia pseudotuberculosis were downloaded from the European Nucleotide Archive (Supplementary Table 1) and processed using the same parameters as data generated in this study, described above (Bioinformatic processing) with the exception of duplicate removal, which removed duplicates with the same start position regardless of length to adjust for single-end sequencing in published data.

After duplicate read removal, we used an in-house script (github.com/pontussk/mpileup2consensus.py) to keep only homozygous sites with a base quality score of Q30, thus excluding any heterozygous sites and small (shorter than a sequence read length) insertions or deletions and a maximum edit distance of 90% using samtools calmd54. We then filtered all sites to retain only sites with a minimum base coverage of 3-fold to obtain a consensus sequence for all published genomes and a 5-fold coverage for the Charterhouse Warren genomes.

Consensus sequences were then filtered to keep only sites that were polymorphic within the set of sequences used for input in the phylogeny, removing all transitions to remove effects of cytosine deamination in the ancient DNA sequences. For the main phylogeny (Fig. 1A) we excluded individuals with more than 70% missingness across the genome and removed sites that were missing in more than 20% of the remaining individuals, resulting in 2306 transversion SNPs that were taken forward to IQ-TREE v.1.6.1229. We implemented model testing using the ModelFinder in IQ-TREE57, which suggested a TVMe+ASC as the best-fit model according to the Akaike information criterion for both phylogenies in Fig. 1A and Fig. 1B. We implemented 1000 bootstrap replicates for the phylogeny in Fig. 1A and rooted the maximum likelihood phylogeny in FigTree58, with the Yersinia pseudotuberculosis outgroup (IP32881) (the outgroup is not shown in the figure to increase the visibility of the LBNA clades in Fig. 1A).

Due to its low coverage, a separate phylogenetic inference was used to position the Levens Park genome, C10928. Here, we took the filtered consensus sequences of the LNBA strains, which were topologically close to the monophyletic Charterhouse Warren (C10091 and C10098) group, and filtered them to a minimum base coverage of 3, keeping transversions only and sites present in at least 70% of the genomes (regardless of missingness across the genomes). This resulted in 104 SNPs taken forward to IQ-TREE, using a TVMe+ASC model with 100 bootstrap replicates. The maximum likelihood phylogeny was rooted in FigTree58 based on Fig. 1A and transformed into the cladogram in Fig. 1B.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Read more here: Source link