Emergence and spread of SARS-CoV-2 lineage B.1.620 with variant of concern-like mutations and deletions

B.1.620 carries numerous VOC mutations and deletions

Lineage B.1.620 attracted our attention due to large numbers of unique mutations in B.1.620 genomes from Lithuania in nextclade analyses (its genomes are 18 mutations away from nearest relatives and 26 from reference strain Wuhan-Hu-1), and those genomes initially being assigned to clade 20A, corresponding to B.1 in Pangolin nomenclature4,5. Meanwhile, Pangolin (using the 2021-04-01 version of pangoLEARN) variously misclassified B.1.620 genomes as B.1.177 or B.1.177.57 and occasionally as correct but unhelpful B.1, prior to the official designation of B.1.620 by the Pango SARS-CoV-2 lineage nomenclature team. To this day even after official designation Pangolin still often struggles with B.1.620 sequences and classifies them as various VOCs (often as B.1.1.7) when not used in the new UShER mode and vice versa sometimes classifies non-B.1.620 genomes as B.1.620. Closer inspection of B.1.620 genomes revealed that this lineage carries a number of mutations and deletions that have been previously observed individually in VOCs and VOIs (Fig. 1 and Supplementary Fig. S3), but had not been seen in combination. Despite sharing multiple mutations and deletions with known VOCs (most prominently HV69/70Δ, LLA241/243Δ, S477N, E484K and P681H), lineage B.1.620 does not appear to be of recombinant origin (Supplementary Fig. S4).

Fig. 1: Lineage-defining SNPs of lineage B.1.620.

Only SNPs that differentiate B.1.620 (genomes outlined with a dashed line) from the reference (GenBank accession NC_045512) and that are shared by at least two B.1.620 genomes are shown in the condensed SNP alignment. Sites identical to the reference are shown in grey, changes from the reference are indicated and coloured by nucleotide (green for thymidine, red for adenosine, blue for cytosine, yellow for guanine, dark grey for ambiguities, black for gaps). The first 100 and the last 50 nucleotides are not included in the figure but were used to infer the phylogeny. If a mutation results in an amino acid change, the column label indicates the gene, reference amino acid, amino acid site, and amino acid change in brackets. The phylogeny (branch lengths in the number of mutations) on the right shows the relationships between depicted genomes and was rooted on the reference sequence with coloured circles at the tips indicating the country from which the genome came. Posterior probabilities of nodes leading up to lineage B.1.620 are shown near each node with the long branch leading to lineage B.1.620 labelled as ‘B.1.620’.

Through travel-related cases of B.1.620 discussed later we suspected Cameroon as the immediate source of this lineage and therefore sought to identify close relatives of this lineage there. While genomic surveillance in Cameroon has been limited, the genomes that have been shared on GISAID are quite diverse and informative. A handful appears to bear several mutations in common with lineage B.1.620 and could be its distant relatives (Fig. 1). Synonymous mutations at site 15324 and S:T1027I appear to be some of the earliest mutations that occurred in the evolution of lineage B.1.620, both of which are found in at least one other lineage associated with Cameroon (B.1.619), followed by S:E484K which also appears in genomes closest to lineage B.1.620. Even though the closest genomes to B.1.620 were sequenced from samples collected in January and February, lineage B.1.620 has 23 changes (mutations and deletions) leading up to it compared to the reference. During this study, SARS-CoV-2 genomes collected in January-March 2021 from the Central African Republic were deposited on GISAID, but none of them resembles forebearer or sibling lineages to B.1.620.

B.1.620 is likely to escape antibody-mediated immunity

Like most currently circulating variants, B.1.620 carries the D614G mutation, which enhances infectivity of SARS-CoV-2, likely through enhanced interactions with the ACE2 receptor by promoting the up-conformation of the receptor-binding domain (RDB)15. Furthermore, B.1.620 contains P26S, HV69/70Δ, V126A, Y144Δ, LLA241/243Δ and H245Y in the N-terminal domain (NTD) of the spike protein. The individual V126A and H245Y substitutions are still largely uncharacterised to the best of our knowledge, but might be counterparts to the R246I substitution in B.1.351, and the latter may interfere with a putative glycan binding pocket in the NTD16. All other mutations of B.1.620 in the NTD result in partial loss of neutralisation of convalescent serum and NTD-directed monoclonal antibodies17. This indicates that these mutations present in B.1.620 may have arisen as an escape to antibody-mediated immunity18. The spike protein of B.1.620 also carries both S477N and E484K mutations in the RBD, but in contrast to other VOCs not the N501Y or K417 mutations. Like the mutations in the NTD, S477N and E484K individually enable broad escape from antibody-mediated immunity18. Moreover, deep mutational scanning experiments have shown that these substitutions also increase the affinity of the RBD for the ACE2 receptor19. Both S477N and E484K occur on the same flexible loop at the periphery of the RDB-ACE2 interface20.

We have modelled the RBD–ACE2 interface with the S477N and E484K substitutions using refinement in HADDOCK 2.421. These models show that both individual substitutions and their combination produce a favourable interaction with comparable scores and individual energy terms to the ancestral RBD (Supplementary Fig. S5). Whereas S477N may modulate the loop conformation22, E484K may introduce new salt bridges with E35/E75 of ACE2. These results indicate that B.1.620 may escape antibody-mediated immunity while maintaining a favourable interaction with ACE2. The remaining mutations in the spike protein—P681H, T1027I and D1118H—are uncharacterised to the best of our knowledge. Of these, P681H is also located on the outer surface of the spike protein, directly preceding the multibasic S1/S2 furin cleavage site23. In contrast, T1027I and D1118H are both buried in the trimerisation interface of the S2 subunit24.

While only limited empirical data are available, they seem to agree with the expectation that B.1.620 is likely to be antigenically drifted relative to primary genotypes. A report presented to the Lithuanian government on May 22, 202125 indicated that amongst 101 sequenced B.1.620 cases at the time, 13 were infections in fully vaccinated individuals, five of whom were younger than 57 years old. Though not systematised properly, sequencing indications for a substantial number of SARS-CoV-2 genomes from Lithuania were available, of which 213 were ‘positive PCR at least 2 weeks after the second dose of vaccine’, of which 195 were B.1.1.7 and 12 were B.1.620. Since detection of the first B.1.620 case on March 15, 2021, in Lithuania ~10,000 SARS-CoV-2 genomes were sequenced to date, 9251 of which were B.1.1.7 and 248 of which were B.1.620. Thus B.1.620 is found 2.4 times more often in vaccine breakthrough cases compared to its population prevalence, whereas for B.1.1.7 this enrichment is only 1.05-fold. Similarly, the frequency of B.1.620 across the five most affected European countries (Lithuania, Germany, Switzerland, France and Belgium) appears relatively stable though at a low level, unlike B.1.1.7 which has been in noticeable decline since April–May (Supplementary Fig. S6), presumably on account of increasing vaccination rates and improving weather in Europe.

Local transmission of B.1.620 in Europe

Local transmission of B.1.620 in Lithuania has been established as a result of monitoring the outbreak in Anykščiai municipality (Utena county, Lithuania) via sequencing and repeat PCR testing of SARS-CoV-2 positive samples for the presence of E484K and N501Y mutations, as well as looking for S gene target failure (SGTF) caused by the HV69Δ deletion. Genotypes identical to those found initially in Vilnius and Utena counties were later identified by sequencing in Panevėžys and Šiauliai counties, indicating continued transmission of lineage B.1.620 in-country. Interestingly, a single case in Tauragė county, Lithuania, identified by sequencing was a traveller returning from France found to be infected with a different genotype than the main outbreak lineage in Lithuania without evidence of onward transmission via local contact tracing efforts or genomic surveillance.

In addition to an ongoing disseminated outbreak of B.1.620 in Lithuania, genomes of this lineage have been found elsewhere in Europe. Though derived from separate introductions from the one that sparked outbreaks in Lithuania, other B.1.620 genomes from Europe appear to indicate ongoing transmission in Europe, with the clearest evidence of this in Germany and France, where emerging clades are comprised of identical or nearly identical genotypes (Fig. 2). Presenting evidence for local transmission in Europe, B.1.620 genomes from countries like Spain and Belgium (also see next section) were notably picked up by baseline surveillance and thus are likely to represent local circulation, though presumably at much lower levels at the time of writing. Figure 2 shows the aforementioned local transmission clusters in Lithuania, Spain (Vilassar De Mar, province of Barcelona), France (see below), and Germany (state of Bavaria), amongst numerous others.

Fig. 2: Maximum-likelihood tree of lineage B.1.620 in Europe.

Relationships between B.1.620 genomes, coloured by country of origin (same as Fig. 1) with a thicker coloured outline indicating the country of origin for travel cases. At least ten genomes shown (samples collected in Belgium, Switzerland, France and Equatorial Guinea) are from individuals who returned from Cameroon, one is from a traveller returning from Mali and one Lithuanian case returned from France. Genomes from the Central African Republic (CAR) and Czechia (returning traveller from Mali) are descended from the original B.1.620 genotype, while the genome from Equatorial Guinea is already closely related to genomes found in Europe and happens to be a travel case from Cameroon. Each genome is connected to the available geographic location in Europe with the smallest circles indicating municipality-level precision, intermediate size corresponding to county-level information (centred on county capital) and largest circle sizes indicating country-level information (centred on country capital). Countries are assigned the same colours as in Figs. 1 and 3.

In France, nine B.1.620 genomes (EPI ISL 1789089 – EPI ISL 1789097) were recently obtained from a large contact tracing investigation of a single transmission chain. These infections in the municipality of Pontoise (Val d’Oise department, to the northwest of Paris) occurred in adults (ages 24–38) who were all asymptomatic at the time of sampling. Additional infections in Pontoise outside of this cluster occurred in four adults (ages 29–57) and form a monophyletic cluster with the other nine infected individuals (Supplementary Fig. S4). The putative index case for these infections has yet to be determined through contact tracing at the time of writing but these cases clearly point to the B.1.620 lineage circulating in the Val d’Oise department. These infections seem to stem from local ongoing transmission in the ({hat{rm I}})le-de-France region, clustering with two patients ages 1 (sample from a children’s hospital in Paris: Hôpital Necker-Enfants malades) and 69. These infections in ({hat{rm I}})le-de-France in turn cluster with two infections from Le Havre (region of Normandy; 180km from Pontoise), pointing to either a travel event from Normandy to ({hat{rm I}})le-de-France or possible local transmission in the north of France (Supplementary Fig. S4).

B.1.620 likely circulates at high frequency in central Africa

In the absence of routine surveillance at a location, sequencing infected travellers originating from there constitutes the next most efficient way to monitor distinct viral populations. This has been used successfully to uncover cryptic outbreaks of Zika virus in Cuba26 and SARS-CoV-2 in Iran at the beginning of the pandemic13. The latter study describes a novel approach to accommodate differences in sampling location and location of infection, and is hence specifically targeted to exploit recorded travel histories of infected individuals in Bayesian phylogeographic inference, rather than arbitrarily assigning the origin of the sample to either location. When we first compiled our B.1.620 genomes dataset we had seven genomes from travellers and six were sampled in the Central African Republic (CAR) near the border with Cameroon, indicating the most plausible geographic region where B.1.620 is circulating widely to be central Africa (Supplementary Fig. S7). Neighbours of countries reporting local B.1.620 circulation (Cameroon, CAR, DRC, Gabon, Equatorial Guinea, and later the Republic of Congo) have either not submitted genomes to GISAID during the study period (Chad, Sudan, South Sudan, Burundi) or have epidemics dominated by SARS-CoV-2 lineages that are not B.1.620 (Supplementary Fig. S8).

The collected individual travel histories themselves point to several independent introductions of B.1.620 into Europe, with documented cases of infected travellers returning from Cameroon to Belgium, France and Switzerland, and from Mali to Czechia (Fig. 3). We note that the metadata for a returning traveller from Cameroon to Belgium (EPI_ISL_1498300) presents evidence of ongoing local transmission within Belgium of B.1.620. Whereas this patient had spent time in Cameroon from the 16th of January until the 7th of February, a positive sample was only collected on the 15th of March, 2021. Even when assuming a lengthy infectious period of up to twenty days27, this patient’s infection can not stem from his prior travel to Cameroon, which indicates an infection with B.1.620 within Belgium and hence stemming from contact within the patient’s community. Additionally, two Belgian patients (EPI_ISL_1688635 and EPI_ISL_1688660) were likely infected by the former’s niece who had travelled with her family to Cameroon and tested positive upon their return to Belgium. These findings are reinforced by more recent samples from Belgium, for which no travel history could be recorded and the patients declared not having left the country.

Fig. 3: Known locations and travel history of B.1.620 cases.

Collection dates of B.1.620 genomes are shown for each country (rows). Genomes from travellers are outlined with colour indicating travel of origin (e.g. dark red for Cameroon) and connected to a smaller dot indicating which country’s diversity is being sampled at the travel destination. Bars at the top indicate the number of genomes of B.1.620 available for a given date across all countries. Countries are assigned the same colours as in Fig. 1.

Using a Bayesian phylogeographic inference methodology that accommodates individual travel histories we were able to reconstruct location-annotated phylogenies at both the continent and country levels. Figure 4A shows the MCC tree of the continent-level phylogeographic analysis, which yields 99.5% posterior support for an African origin of lineage B.1.620. From this inferred African origin, the variant then spread to different European countries via multiple introductions, which is confirmed by our collection of travel history records for individuals returning to these countries. Subsequent country-level phylogeographic analysis—shown in Fig. 4B—points to central Africa as the likely origin of this lineage, with the Central African Republic receiving posterior support of 80.5% and Cameroon 16.8%, taking up 97.3% of the probability mass together. Assuming a Central African Republic origin, the variant is estimated to have spread to Europe via a series of introductions, confirming what was also observed in our recorded travel history records. Interestingly, a single Lithuanian case—a returning traveller from France—does not cluster with the cluster of remaining sequences from Lithuania, illustrative of at least two independent introductions of lineage B.1.620 into Lithuania. Figure 4B also shows multiple separate B.1.620 introduction events from central Africa into the United Kingdom and the United States.

Fig. 4: Maximum clade credibility trees of lineage B.1.620 coloured by reconstructed location using the latest available data as of June 2021.

A Global phylogeny of SARS-CoV-2 genomes with branches coloured by inferred continent from a Bayesian phylogeographic analysis that makes use of individual travel histories. Lineage B.1.620 is outlined and a horizontal bar shows the posterior probability of its common ancestor existing in a given continent. Africa is reconstructed as the most likely location (posterior probability 0.995) where B.1.620 originated. The 95% and 50% highest posterior density (HPD) intervals for the most recent common ancestor date of lineage B.1.620 are indicated with violin plots centred on the common ancestor. B Phylogeny of lineage B.1.620 with branches coloured by inferred country from a Bayesian phylogeographic analysis that makes use of travel histories. A vertical bar shows posterior probabilities of where the common ancestor of B.1.620 existed. In this analysis, Central African Republic (CAR) and Cameroon are reconstructed as the most likely locations (with posterior probabilities of 0.805 and 0.168, respectively) of the common ancestor of lineage B.1.620. Larger white dots at nodes indicate nodes with a posterior probability of at least 95%, while smaller grey circles indicate nodes with a posterior probability of at least 50%. The 95% and 50% highest posterior density (HPD) intervals for the most recent common ancestor date of lineage B.1.620 are indicated with violin plots centred on the common ancestor.

Air passenger flux out of Cameroon and Central African Republic (Fig. 5) shows that many travellers had African countries as their destination, including many that have not reported any B.1.620 genomes to date. This suggests that B.1.620 could be circulating more widely in Africa and its detection in Europe has mostly occurred in countries with recent active genomic surveillance programmes. Detections of B.1.620 in African states neighbouring Cameroon and Central African Republic (Equatorial Guinea, Gabon, DRC and lately the Republic of Congo), even at low sequencing levels, suggest that B.1.620 may be prevalent in central Africa. We find this apparent rise to high frequency and rapid spread across large areas of Africa noteworthy in light of other findings reported here, namely that currently available B.1.620 genomes appeared suddenly in February 2021 (Fig. 3), are genetically homogeneous (Fig. 2), and to date have no clear close relatives (Fig. 1).

Fig. 5: Total air passenger flows out of Cameroon (top) and Central African Republic (bottom) between December 2020 and April 2021.

Destination countries are sorted by total passenger volume arriving from Cameroon and Central African Republic (CAR) combined, coloured by continent (Europe in light blue, North America in dark blue, Africa in green, and Asia in red) and limited to countries where at least 100 passengers have arrived from either Cameroon or CAR between December 2020 and April 2021. Note the nearly order of magnitude greater passenger flux out of Cameroon compared to the Central African Republic (CAR). Numbers above each country’s bar indicate the total number of genomes on GISAID from that country since January 1st 2021, according to GISAID’s 2021-07-02 metadata release. Bars outlined in black represent countries that have submitted at least one B.1.620 genome as of June 2021.

Read more here: Source link