### Correlation between RBD Opening and Backbone Dihedral Angle.

Multiple unbiased trajectories were propagated from different regions of the S-protein RBD opening conformational space, priorly explored by SMD and umbrella sampling (details in *SI Appendix*). Three of those trajectories were assigned as the closed, partially open, and fully open states, based on the position of free energy minima along the US RC (see *SI Appendix*, Fig. S1). The stability of these three conformations were ensured by inspection of the trajectories. The cumulative simulation data were projected onto a feature space composed of pairwise distances between residues from RBD and from other parts of the spike near the RBD (details in *SI Appendix*); these distances increase during the down-to-up transition. The projected trajectory data were subjected to principal component analysis (PCA) (42) and tICA (43⇓–45). The former method calculates the degrees of freedom with the highest variance in the system, while the latter obtains the ones with the longest timescale. These methods quantify large conformation changes in complex biomolecules. The goal of performing PCA and tICA is to find one or two coordinates which best describe the RBD opening motion. As one continuous trajectory hardly samples the transition event, multiple short trajectories spanning a large range of the configuration space were used (46).

The first principal component (PC) and the first time-lagged independent component (tIC), obtained from PCA and, respectively, from tICA analysis, could both distinguish the closed and open states, although the partially open and fully open states could not be distinguished within the first two PCs (*SI Appendix*, Fig. S2). Yet, the projections of the open- and closed-state trajectories along the first two principal components are in agreement with the results of previous long multimicrosecond spike protein simulations (35). Because of the clear distinction between the closed, the open, and the intermediate, partially open states (Fig. 2*B*), we chose the first two tICs for our subsequent analysis.

Large-scale conformational changes in proteins are, at a fundamental level, stemming from complex combinations of transitions between various states of the protein backbone torsional angles ϕ and ψ. These combinations add up to global displacements, which set the timescale for internal friction (47) and gauge the paradoxically large number of conformational states accessible to a protein as it folds (48). On one hand, concerted transitions of the backbone torsions typically lead to large-scale motion. On the other hand, exponential divergence in nonlinear dynamical systems (49) is such that only certain dihedrals are likely to predominantly effect the conformational changes. We therefore hypothesize that there exist specific residues in the spike protein for which the transition in backbone dihedral states results in the opening of the RBD. Moreover, we conjecture that, given the significant mutation rate, and because of the selection pressure on the virus, the residues with large impact in collective motions that facilitate infectivity are likely to be selected in spike mutants. To test this hypothesis, we calculated the Pearson correlation coefficients of the sines and cosines of all the ϕ and ψ backbone torsion angles of all the residues with the first two tICs. The magnitude of the correlation is found to be significantly large only for a handful of torsion angles, whereas the majority show near-zero correlation (*SI Appendix*, Fig. S3).

We defined a metric called “correlation score” (

$CS$) for each torsion angle in each residue. The value of the metric is computed as

$$\begin{array}{cc}\hfill CS\left(\theta \right)& =\left|C\left(\mathrm{cos}\left(\mathit{\theta}\right),\mathbf{I}\mathbf{C}\mathbf{1}\right)\right|+\left|C\left(\mathrm{sin}\left(\mathit{\theta}\right),\mathbf{I}\mathbf{C}\mathbf{1}\right)\right|\hfill \\ \hfill & \text{}\hspace{1em}+\left|C\left(\mathrm{cos}\left(\mathit{\theta}\right),\mathbf{I}\mathbf{C}\mathbf{2}\right)\right|+\left|C\left(\mathrm{sin}\left(\mathit{\theta}\right),\mathbf{I}\mathbf{C}\mathbf{2}\right)\right|,\hfill \end{array}$$[1]where θ,

$\mathbf{I}\mathbf{C}\mathbf{1}$, and

$\mathbf{I}\mathbf{C}\mathbf{2}$are the vectors containing the time series of the angle θ, the tIC1, and the tIC2, respectively, and

$C\left(\mathbf{x},\mathbf{y}\right)$is the Pearson correlation coefficient of datasets x and y. The

$CS$metric can take values from zero to four, and a higher value indicates that the particular torsion angle shows a highly correlated (or anticorretated) motion with the slowest conformational change which, in this case, is the RBD transition. We avoided summing over the Φ and Ψ angles of the same residue, or over residues of different protein chains, as it might average out the contributions from each angle and consequently obscure the process of specifying the role of each individual residue in the conformational transition.

We sorted the residues based on the

$CS$s of their torsion angles, and a list is provided in *SI Appendix*. The highest values of correlation scores are shown primarily by pairs of consecutive residues, with ψ of the first and the ϕ of the second residue, as depicted in Table 1. This suggests that two consecutive torsion angles in certain regions of the protein are the most highly correlated with the RBD opening motion. Most dihedrals belonged to residues in the loop structure joining the RBD with the S2 stem, as this region is a hinge for the opening of the RBD. This correlation does not exclude causation, since change in the conformational state for two subsequent torsion angles can induce crankshaft motion (50) in the backbone which, propagating along the chains, leads to a change in protein structure.

The distribution of some of the dihedral angles, with highest

$CS$s, in the closed and partially open states, are depicted in Fig. 2*D*. Similar plots for all torsion angles with

are provided in *SI Appendix*. We compared the dihedral angle distribution only beween the closed and the partially open states, as the significance of the artificially prepared fully open structure should not be overemphasized. This structure, generated from the closed state using SMD and US (*SI Appendix*), is only an approximation for a more exact RBD up structure that binds the ACE2 receptor. In literature, the structure with PBD ID 6VSB is sometimes referred to as the open conformation (35), which, in this work, we refer to as the “partially open” state. So, in the rest of the paper, when we attempt to compare the behavior of a chosen set of residues between closed and open states at an atomistic detail level, we only include the closed and partially open configurations. The fact that the highly correlated residues follow a distinctly different distribution in the backbone torsion angle space (Fig. 2*D*) indicates that a handful of non-RBD residues can play a pertinent role in the conformational change of the spike and, consequently, in the viral infection. Interestingly, the correlated torsion angles span over all three chains of the spike trimer, namely A (the one undergoing the RBD transition), B, and C (Fig. 2*A* and Table 1), hinting at the potential role of interresidue couplings ranging over long distances in presenting the RBD to the ACE2 receptor. The residues exhibiting highest correlation scores (Table 1), particularly Gln613, Asp614, Pro600, Gly601, Ile569, and Ala570, are present in the linker region joining the RDB with the S2 domain, which, as mentioned above, is the hinge for the opening motion that presents the RBD to the receptor (51). The Phe833 and Ile834 residues, although technically part of S2 domain, can significantly impact the dynamics of the hinge or linker due to their proximity in 3D structure. Similar arguments are applicable for the NTD domain residues such as Cys136 and Asn137 from chain B and Ser112 from chain C, which are able to impact the RBD due to their structural proximity. Interestingly, residues near the stem region, including Cys1082, His1083, and Asp1084, appear in our list as strongly correlated and can potentially be used as a target for broad-spectrum antibody or vaccine design targeting the stem region (52, 53).

When the virus mutates these particular residues in a way that increases its virulence, this increase stems from the propensity of the RBD to “flip” open and thereby increase ACE2 binding. As evidence, we highlight the example of the D614G mutation, which is already observed in numerous strains of the SARS-CoV-2 all over the world (10, 40). Cryo-EM studies have indicated that the D614G mutation is, by itself, capable of altering the conformational dynamics of spike protein by stabilizing an RBD up state over the down conformation (54, 55). D614 is one of the top-ranked residues predicted from our model for the potential to play a crucial role in RBD opening. A glycine residue has the least backbone torsion barrier for conformational transition in ϕ−ψ space, due to the absence of a side chain. Replacing an Asp residue, which has higher barriers to such transitions, with a glycine can increase the flexibility of the backbone, significantly impacting the probability of observing an RBD up conformation. To understand whether this can be the reason why this particular mutation was selected to become so widespread, a comparison of the geometric and “chemical” effects of Gly should be assessed. To this end, we performed additional simulations of the open and closed states of the D614G mutant. Indeed, our simulations of the D614G mutant spike indicate that, unlike the wild-type (WT) system for which a significantly different dihedral angle distribution exists, there is no difference between the closed and the partially open configurations in terms of the torsion angle space explored by residue G614 (*SI Appendix*, Fig. S14). The glycine residue at 614 position also experienced different degrees of hydrogen bonding and electrostatic interactions (see below).

A wide range of spike protein mutant sequences have been characterized, each with varying degrees of abundance. A relatively rare mutation, A570V, resulted in a decrease of the overall stability of the spike protein in all three states, based on the FoldX empirical force field (10, 56). Free energy values (10) were obtained from only structural data, and no dynamical information was considered in that study. Yet it is worth noting that the change in total and solvation free energies, due to this mutation, were substantially different for the closed and open states, resulting in a change in

$\mathrm{\Delta}G$for RBD opening. But, as the side chains of Ala and Val are similar in terms of steric bulk, this mutation is unlikely to significantly impact RBD dynamics. As it likely did not increase the evolutionary advantage of the virus by increasing infectivity, this mutation, only occurring in one strain (10) so far, did not become as prevalent as D614G.

On the contrary, an A570D mutation is observed in the same residue in the newly emerged and highly infectious B.1.1.7 strain in the United Kingdom (41, 57, 58). This mutation is likely to play a pertinent role in infection, as it replaces a hydrophobic amino acid with a charged one. This leads to a significant difference in the conformational dynamics of the 570 residue and consequently impacts the large-scale RBD opening motion. Structural biology experiments have established that a mutation in the A570 residue alters the propensity of RBD opening by modulating the hydrophobic interaction of the hinge region with the S2 core (59). Coarse-grained modeling studies explained this observation by noting that A570 is part of a regulatory switch that triggers the conformation change necessary for receptor binding (60).

The B.1.1.7 also shows a P681H mutation close to the highly correlated N679 residue predicted from our model (*SI Appendix*, Fig. S5). As this mutation replaces a structurally rigid proline residue, it can possibly impact the conformational space accessible to nearby residues, including N679.

Giving pause for thought, these results indicate that mutations in the highest correlated residues (Table 1) can, in fact, have significant physiological impact in changing the course of the pandemic. Therefore, we provide a list of residues (Table 1 and *SI Appendix*, Figs. S3–S10), future mutations of which could impact RBD dynamics and consequently change the transmissibility or virulence of SARS-CoV-2. (See *Data Availability* for access to the raw correlation coefficient data for all residues.) Yet, care should be taken with assigning the predominant role in infection to a single, non-RBD domain residue in the UK variant; several other mutations are present that could modulate the binding affinity to the ACE2 receptor (particularly N501Y in the RBD−ACE2 binding interface). However, mutations outside the RBD can indeed play a key role in infection by disproportionately favoring an RBD “up” structure (52, 53, 59).

For a more detailed understanding, we compared the average number of hydrogen bonds per residue group from Table 1 for the closed and the partially open trajectories (Fig. 2*C*). We observed significant changes in the number of hydrogen bonds in residue groups: Q613/D614, I569/A570, C1082/H1083/D1084, and C136/N137 (Fig. 3). In the closed state, the carboxylate side chain of D614 residue forms hydrogen bonds with K854 and T859 which are lost in the RBD up configuration. These hydrogen bonds will be absent in D614G mutant and likely reduce the energy cost of the conformational transition. Particularly, the loss of hydrogen bond with T859 has been attributed to the higher stability of the RBD up structure in D614G mutant, by Mansbach et al. (61). Our simulations also indicate that there is a loss of one hydrogen bond in the Q613/D614 residues going from RBD down to RBD up conformation in the WT spike. But such loss of hydrogen bonding is not observed in the case of the D614G mutant (*SI Appendix*). On the contrary, formation of new hydrogen bonds is observed in the other three residue groups (Fig. 3) which can be enhanced or reduced by mutating the residues involved.

Additionally, the nonbonded interaction energies (electrostatic and van der Waals [vdW]) of the residues, from Table 1, differ significantly in the two conformations. Unsurprisingly, the D614 is energetically stabilized in the closed conformation in comparison to the open state, due to additional hydrogen bonds. In the D614G mutant, from our analysis, this stabilization is significantly lower in comparison to the WT (*SI Appendix*, Fig. S15 and *Additional results*). However, residues P600/G601 are more stabilized in the open state in comparison to the closed state via favorable Coulomb and vdW interactions. A similar effect is observed in C136 for electrostatic energy but is somewhat compensated for by the opposite trend in vdW energy. A570 and N137 have lower electrostatic energy in the closed state despite having fewer hydrogen bonds. In short, non-RDB residues that experience a different amount of nonbonded interaction with the rest of the protein or show different hydrogen bonding patterns in the “RBD down” and “RBD up” state can impact the relative stability of those two conformations when mutated into residues with different properties.

### Mutual Information and Network Model.

A different approach to characterize the coupling between distant regions in a protein is to calculate the cross-correlations between the positions of different residues in 3D space. This method is often used to study allosteric effects upon ligand binding (62⇓⇓⇓⇓–67). Conventional implementations compute the dynamic cross-correlation map (DCCM) of the position vectors of the

${C}_{\alpha}$atoms (68). However, DCCM ignores correlated motions in orthogonal directions (67). This problem can be avoided by using a linear mutual information (LMI)-based cross-correlation metric, which we use in the current study (62, 69). The cross-correlation matrix elements,

${C}_{ij}$, are given by

$${C}_{ij}={\left(1-{e}^{-\left(2/d\right){I}_{ij}}\right)}^{-1/2},$$[2]where

${I}_{ij}$is the LMI computed as

$${I}_{ij}={H}_{i}+{H}_{j}-{H}_{ij},$$[3]with H as the Shannon entropy function,

$$\begin{array}{cc}\hfill & {H}_{i}=-\int p\left({\mathbf{x}}_{i}\right)\mathrm{ln}p\left({\mathbf{x}}_{i}\right)d{\mathbf{x}}_{i}\hfill \\ \hfill & {H}_{ij}=-\iint p\left({\mathbf{x}}_{i},{\mathbf{x}}_{j}\right)\mathrm{ln}p\left({\mathbf{x}}_{i},{\mathbf{x}}_{j}\right)d{\mathbf{x}}_{i}d{\mathbf{x}}_{j},\hfill \end{array}$$[4]where, for two residues i and j,

${\mathbf{x}}_{i},{\mathbf{x}}_{j}$are the 3D Cartesian vectors of atomic coordinates of the corresponding

${C}_{\alpha}$atom, whereas

$p\left({\mathbf{x}}_{i}\right)$and

$p\left({\mathbf{x}}_{i},{\mathbf{x}}_{j}\right)$indicate, respectively, the marginal probability density for

${\mathbf{x}}_{i}$and the joint probability density of

${\mathbf{x}}_{i}$and

${\mathbf{x}}_{j}$(62, 69). The change in cross-correlation between the apo and holo states of a protein is a gauge that traces allosteric communication in the protein by monitoring the changes in the local correlations between protein residues (62). The spike protein conformational change is not an allosteric process by strict definition, as it does not involve the binding of an effector. However, comparing the LMI cross-correlations between the RBD down and up states can help identify residues which behave differently in different protein conformations. More importantly, the difference of correlation between the closed state and a small perturbed structure toward the conformational opening can hint at the residues which gain or lose contact in the beginning stage of the opening transition and consequently initiate the large-scale motion. So we assigned one of the unbiased trajectories as “slightly open,” indicative of an early stage structure in the pathway of opening. We included the unbiased trajectory, corresponding to this structure, in our subsequent analysis, in which we compared the correlation heat map of all residues in the closed form with the three open states, namely the partially open, fully open, and slightly open state, in order to understand the coupling between RBD opening and protein residue fluctuations.

Overall change in LMI correlation is clearly larger for the fully open state in comparison to the partially open state, as evident from the higher appearance of reddish color (Fig. 4 *A*–*D*). Unsurprisingly, the change is largest for the RBD and proximal residues encompassing the NTD region (residues 100 to 300) in all chains (Fig. 4 *E*−*G*), as they lose direct contact during the opening motion. Residues in the RBD−S2 linker region in chains A and C (residues 524 to 700) show a large gain in correlation in the initial stage of RBD opening (“slightly open”) despite being not directly in contact with the RBD in the closed form. On the pathway from close to open, relevant correlation changes are already found in the slightly open state when comparing to the closed state (Fig. 4*E*), with significant changes for the important RBD−S2 linker region. However, more details appear (in other distant regions) as the transition approaches the partially open and fully open states; compare Fig. 4 *F* and *G*.

Overall, these results are consistent with the dihedral angle correlations, described in the previous section: The residues in the loop region next to the RBD exhibit a change in the values of the backbone dihedral angles upon the down-to-up transition. The change in the correlation coefficient (ΔCorrelation) is also large for the RBDs and NTDs of chains B and C, which are in close proximity to chain A of the RBD in the closed state. Additionally, linker residues of chain B show significant gain in correlation upon transitioning to the fully open state. Some residues that gain or lose correlation (blue or red coloration in Fig. 4 *E*−*G*) are situated at the opposite end (S2 region) of the spike, indicating the presence of long-range correlated motion. This long-distance correlation can indeed be a cumulative effect of many small local fluctuations on the way toward the RBD, along structural patches connecting these sites, allowing “distant” residues to shed their impact on the structural transition in RBD. These pathways can be revealed, for example, by the method of Ota and Agard (70), who used energy flow or vibrational energy relaxation to trace them.

For a more profound insight, we built a protein connectivity graph network model. In it, the

${C}_{\alpha}$ atoms of each amino acid are the nodes, and the correlations between them are the edges connecting the nodes. The number of nodes in our system is *n* = 3,438, which makes it one of the largest systems studied previously with this method (62⇓⇓⇓⇓–67) [comparable to the work by Saltalamacchia et al. (71) on a splicosome complex involving 4,804

atoms and 270 phosphorus atoms in the network]. We then calculated the betweenness centrality (BC), a graph theoretical measure that provides a way to quantify the amount of information that flows via the nodes and edges of a network. If a node i is working as a bridge between two other nodes along the shortest path joining them, then the BC of node i is given by

$$BC\left(i\right)=\sum _{st}\frac{{n}_{st}^{i}}{{g}_{st}},$$[5]where

${g}_{st}$is the total number of geodesics (shortest paths) joining nodes s and t, out of which

${n}_{st}^{i}$paths pass through node i (63). The change of BC in the dynamics of the spike protein has been recently observed using coarse-grained simulation methods (19). Despite the coarseness of the model, a handful of residues participating in the information-propagating pathway could be identified directly from the BC values. In the current work, we used the difference in BC as a metric to identify key residues which gain or lose relative importance along the allosteric information pathway. The difference in the normalized BC is measured by comparing the number for the partially open and fully open states with the closed conformation (i.e.,

${BC}^{\text{slightly open}}-{BC}^{\text{closed}}$,

${BC}^{\text{partially open}}-{BC}^{\text{closed}}$, and

${BC}^{\text{fully open}}-{BC}^{\text{closed}}$for every residue in the spike protein) from our all atom trajectories with explicit solvation. Importantly, our model also includes the highly relevant glycan shield, which was shown to modulate the conformational dynamics of the RBD by favoring a down conformation and functioning as a gate for the conformational opening, beyond its general role in shielding (35, 38). However, glycans were not included in the network analysis. While, in principle, it is valid to consider their role, their motion occurs on timescales that are much faster than those of the protein backbone and would be averaged out of any correlation calculation.

For slightly open, partially open, and fully open states, the residues with significant (e.g., >0.1) change in BC are mostly from the NTD region or RBD region of the B and C chains (Figs. 5 and 6). This suggests that the allosteric information flows through the nearby NTDs and RBDs, and mutations in this region can break the allosteric network (63) and affect the functionality of the spike protein. SARS-CoV-2 neutralizing antibodies were indeed observed to bind these regions of the spike (22). The identified residues are in close proximity to the RBD of chain A, the one undergoing the down-to-up transition.

So the connectivity captured in the BC data is primarily due to short-range coupled fluctuations. Such couplings are broken when the RBD and NTD move apart, leading to the change in BC. For the same reason, the BC of the RBD of chain A increases in the fully open state as its internal vibrations become more independent of the rest of the protein.

In a culmination of the above, the most interesting aspect is the strikingly large change in BC of the residues which are distant from the RBD in 3D structure. Significant gain or loss of BC is observed in residues 607, 624, 713, 757, 896, and 1,097. The first three residues are present in the linker region joining the RBD with the S2 domain, while the other three are in the S2 itself. The linker region has a strong impact on the dynamics of the RBD, as we already established from the dihedral angle analysis. The allosteric network analysis reinforces this conclusion. Moreover, the large change of BC in the S2 domain indicates a complex long-range information flow connecting the RBD with the core residues of the protein. Electrostatic and vdW energy analysis, similar to that mentioned in the previous section, has been performed on the residues with changes in BC greater than 0.2. The interaction energy of the S2 domain residues such as I896 and G757 is significantly different for the closed and open states along some of the NTD residues like N188, V6, and S98. This has substantial implications for pharmaceutical design, as mutations within the NTD and the S2 domain can impact the receptor-binding propensity of the viral spike. These results also suggest that therapeutics targeted toward the S2 and toward the RBD−S2 linker can be effective in preventing COVID-19 infection, without complications stemming from the high rate of mutations in the RBD.

Read more here: Source link