Shotgun scanning glycomutagenesis: A simple and efficient strategy for constructing and characterizing neoglycoproteins

Glycosylation of asparagine residues is one of the most abundant and structurally complex protein posttranslational modifications (1, 2) and occurs in all domains of life (3). Owing to their relatively large size and hydrophilicity or simply their presence at definite locations, asparagine-linked (N-linked) glycans can significantly alter protein properties including biological activity, chemical solubility, folding and stability, immunogenicity, and serum half-life (4, 5). Hence, glycosylation effectively increases the diversity of the proteome by enriching the repertoire of protein characteristics beyond that dictated by the 20 canonical amino acids. For example, accumulating evidence indicates that the immune system diversifies the repertoire of antigen specificities by exclusively targeting the antigen-binding sites of immunoglobulins (IgGs) with posttranslational modifications, in particular N-linked glycosylation (6). Moreover, the profound effect of glycans on proteins has prompted widespread glycoengineering efforts to rationally manipulate key glycosylation parameters (e.g., glycan size and structural composition and glycosite location and occupancy) as a means to optimize protein traits for a range of different industrial and therapeutic applications (710).

Despite some notable successes, the routine use of glycosylation as a strategy for engineering proteins with advantageous properties is currently limited by our inability to predict which sites within a protein are glycosylatable and how glycosylation at permissive sites will affect protein structure and function. Indeed, a deeper understanding of the design rules (i.e., how glycans influence the biological and biophysical properties of a protein) represents a grand challenge for the glycoprotein engineering field. To this end, computational approaches have enabled in silico exploration of glycosylation-induced effects on protein folding and stability (11, 12); however, these involve a trade-off between molecular detail and glycoprotein size, with full-atomistic molecular dynamics simulations typically limited to only short glycopeptides or protein domains (11). To experimentally probe the consequences of glycosylation ideally requires access to large collections of chemically defined glycoproteins in sufficient quantities for characterization (13). Mammalian cells represent an obvious choice to source proteins with both natural and naïve glycosites. However, because of the time-consuming, low-throughput nature of gene transfection and culturing of mammalian cells, studies using mammalian cell-based expression systems have typically only investigated a small number of designs (∼15 or fewer) (1417), with rare exceptions such as the tour de force study by Elliott et al. (18). In addition, the intrinsic variability with respect to the glycan structure at a given site (microheterogeneity) can be unpredictable and difficult to control in mammalian expression systems. Another option is chemical synthesis, which can furnish structurally uniform glycopeptides for investigating the local effects of N-linked glycans on peptide conformation (19). While total chemical synthesis remains challenging for full-length proteins, advances in expressed protein ligation (EPL) have opened the door to convergent assembly of chemically synthesized glycopeptides with recombinantly expressed protein domains to form larger glycoproteins bearing complex N-glycans installed at discrete sites (20, 21). Using this technology, Imperiali and coworkers created a panel of seven site-specifically glycosylated variants of the bacterial immunity protein Im7 modified with the disaccharide N,N‘-diacetylchitobiose (GlcNAc2) and assessed the kinetic and thermodynamic consequences of glycan installation at defined locations (22). Unfortunately, EPL is a technically demanding procedure, requiring manual construction of each individual glycoprotein, which effectively limits the number of testable glycosite designs to just a small handful.

To move beyond these “one-glycosite-at-a-time” methods for supplying glycoproteins, herein we describe a scalable technique called shotgun scanning glycomutagenesis (SSGM) that involves design and construction of combinatorial acceptor protein libraries in which 1) each member of the library carries a single N-glycosite “mutation” introduced at a defined position along the protein backbone and 2) the complete ensemble of glycan acceptor sites (sequons) in the library effectively covers every possible position in the target protein (Fig. 1). The resulting SSGM libraries are expressed using N-glycosylation-competent bacteria in the context of glycoSNAP (glycosylation of secreted N-linked acceptor proteins), a versatile high-throughput screen based on extracellular secretion of glycosylated proteins (23). Using this glycoprotein engineering tool, we constructed and screened SSGM libraries corresponding to three model proteins: bacterial immunity protein Im7, bovine pancreatic ribonuclease A (RNase A), and a human single-chain variable fragment antibody specific for HER2 (scFv-HER2). Our results revealed that installation of N-glycans was tolerated at a large number of positions and in all types of secondary structure, with relatively high N-glycosylation efficiency in the majority of cases. For many of these glycoproteins, the presence of N-glycans at naïve sites had a measurable effect on protein stability and/or activity in a manner that depended on the precise location of the modification. Taken together, these findings demonstrate the ability of the SSGM method to yield large collections of discretely modified neoglycoproteins that collectively reveal glycosylatable sites and provide insight on the influence that site-specific N-glycan installation has on structural and/or functional properties.


Reliable Detection of Acceptor Protein Glycosylation by glycoSNAP Screening.

To enable screening of SSGM libraries, we first sought to adapt glycoSNAP screening for proteins of interest (POIs). In the original glycoSNAP assay, we genetically modified Escherichia coli YebF, a small (10 kDa in its mature form) extracellularly secreted protein (24), with an artificial glycosite (e.g., N-X-S/T or D/E-X1-N-X2-S/T, where X, X1, X2 ≠ P) at its C terminus. The modified YebF protein was expressed in E. coli cells carrying the Campylobacter jejuni N-glycosylation machinery (25) that were bound to a nitrocellulose filter membrane. Following secretion out of filter-bound colonies, putatively glycosylated YebF was captured on a second nitrocellulose membrane, which was probed with antibodies or lectins to detect N-linked glycans. In this way, glycoSNAP creates a convenient genotype–glycophenotype linkage for facile scoring (glycosylated versus aglycosylated) of YebF proteins secreted from individual bacterial colonies (Fig. 2A). Here, we hypothesized that genetic fusion of glycosite-modified POIs to YebF would result in extracellular secretion of the fusion protein such that glycans installed on the POI could be detected by the nitrocellulose membrane-based screening strategy. To test this hypothesis, we initially focused on E. coli Im7 as the POI for several reasons: 1) it is a small, globular 87-residue protein that lacks disulfide bonds and is well-expressed in the periplasm where bacterial N-glycosylation occurs (25); 2) although not a native glycoprotein, Im7 modified at its C terminus with a DQNAT glycosylation tag can be glycosylated by the C. jejuni N-glycosylation machinery in E. coli (25); 3) crystal structures are available for wild-type (wt) Im7 (26) and for Im7 in complex with its cognate toxin colicin E7 (ColE7) (27); and 4) a limited set of seven Im7 variants was previously generated to determine the effects of GlcNAc2 attachment on folding and stability (22), providing some useful reference points for comparison.

Fig. 2.
Fig. 2.

Construction and interrogation of SSGM libraries. (A) Schematic of SSGM library construction using multiplex inverse PCR. The resulting DNA plasmid library, encoding neoglycoprotein variants with glycosite substitutions at every possible position, was used to cotransform E. coli strain CLM24 along with two additional plasmids encoding the requisite N-glycosylation machinery from C. jejuni. The resulting bacterial library was plated on solid agar, after which colonies and their secreted glycoproteins were replica plated on nitrocellulose membranes as described in the text. (B) Immunoblot analysis of acceptor proteins in colony secretions (Left) and extracellular supernatant fractions (Right) derived from E. coli CLM24 carrying a plasmid encoding either YebF-Im7 or YebF-Im7DQNAT along with plasmids encoding N-glycosylation machinery with either wt CjPglB or an inactive mutant (mut). (C) Same as in B but with YebF-RNase A and YebF-RNase AN34 in colony secretions (Left) and periplasmic fractions (Right). Blots were probed with anti-polyhistidine antibody (α-His) to detect acceptor proteins and SBA or hR6 serum to detect glycans. Bottom color panels in B and C depict overlay of α-His and SBA blots (merge). Arrows denote aglycosylated (g0) and singly glycosylated (g1) forms of YebF-Im7DQNAT or YebF-RNase AN34. Molecular weight (MW) markers are indicated at the left. Results are representative of at least three biological replicates.

To determine whether Im7 was compatible with the glycoSNAP procedure, E. coli strain CLM24 was transformed with a plasmid encoding YebF-Im7 that was modified with a DQNAT glycosylation tag (28) at the C terminus of Im7 along with two additional plasmids, one encoding glycosyltransferase (GT) enzymes for the biosynthesis of the N-glycan and the other encoding the oligosaccharyltransferase (OST) for transfer of the resulting N-glycan to acceptor proteins. To minimize microheterogeneity so that modified acceptor proteins all carried identical glycans, we created a system for producing homogeneous N-glycans with the structure GalNAc5(Glc)GlcNAc-Asn (Fig. 1), which is one of several structurally related glycan donors that can be efficiently transferred to target proteins in E. coli by the C. jejuni OST PglB (CjPglB) (29, 30). While the biotechnological value of this glycan is questionable, it served as an excellent model for our proof-of-concept SSGM studies for several reasons. First, it involves formation of the key GlcNAc-Asn linkage, which is the same as found in prototypic eukaryotic N-glycans. Second, it has the potential to be remodeled as a complex-type eukaryotic glycan via a two-step enzymatic trimming/transglycosylation process (31). Third, its structural uniformity and relative abundance when produced heterologously in E. coli cells, as well as its compatibility with PglB, all help to ensure that differences in glycosylation efficiency are minimally affected by substrate-related factors and are instead attributable to accessibility of a given acceptor site.

When plated on solid agar and subjected to the colony-blotting method, cells expressing YebF-Im7DQNAT, or a control YebF-Im7 construct that lacked the glycosylation tag, were able to secrete the fusion into the extracellular medium as evidenced by cross-reaction of an anti-His antibody with the membranes (Fig. 2B). However, only the strain expressing YebF-Im7DQNAT in the presence of wt CjPglB, but not a CjPglB variant rendered inactive by two active-site mutations (D54N and E316Q) (23), gave rise to colonies that reacted with soybean aggluntinin (SBA) (Fig. 2B), a lectin that binds terminal GalNAc residues in the C. jejuni N-glycan (29). The colony blotting results were corroborated by immunoblot analysis of culture supernatants, which revealed that YebF-Im7 and YebF-Im7DQNAT were both secreted into the extracellular medium but only the latter was glycosylated, as evidenced by the appearance of a higher-molecular-weight band in the blot probed with glycan-specific antiserum (Fig. 2B). As expected, no glycan-specific signal was detected in colony blots or immunoblots corresponding to cells carrying the mutant CjPglB enzyme (Fig. 2B). Importantly, the predominant glycan attached to YebF-Im7DQNAT corresponded to GalNAc5(Glc)GlcNAc, which represented >98% of all detected glycoforms as confirmed by mass spectrometry (SI Appendix, Fig. 1). Collectively, these results confirmed the compatibility of bacterial Im7 with our glycosylation workflow, yielding homogenously modified acceptor proteins that were readily detected by glycoSNAP screening.

Rapid Identification of Acceptor Site Permissiveness Using SSGM.

Next, the plasmid encoding YebF-Im7 was mutagenized to create a library of Im7 gene sequences, each carrying an individual sequon substitution and cumulatively covering all positions in the Im7 protein. Mutagenesis was performed using multiplex inverse PCR (32) with a set of divergent abutting primers that were designed to amplify the entire plasmid and introduce an acceptor asparagine residue at every position in the Im7 gene (with the two upstream and two downstream residues being changed to DQ and AT, respectively), thereby yielding a highly focused plasmid library enriched with in-frame clones each bearing a single DQNAT acceptor motif at a defined position (Fig. 2A). Indeed, next-generation sequencing of the preselected plasmid library confirmed complete sequence coverage for all glycosite positions in Im7, with >103 reads detected for all but one position (SI Appendix, Fig. 2). With all glycosite variants present and accounted for, the resulting plasmid library was introduced into strain CLM24 carrying the requisite N-glycosylation machinery and the library-transformed cells were plated on solid agar and subjected to glycoSNAP screening. From one membrane we detected a total of ∼200 glycosylation-positive colonies, of which 20 were randomly chosen for further analysis. Sequencing confirmed that a single in-frame DQNAT motif was present in each isolated hit, with the Im7N37 and Im7N58 variants (where the superscript denotes the location of the asparagine residue) occurring three and two times, respectively (Fig. 3A). The hits were fairly evenly distributed throughout the entire Im7 sequence and situated in every type of secondary structure including bends, turns, and α-helices, consistent with X-ray crystallographic data showing that occupied glycosylation sites can occur on all secondary structural elements (33). Immunoblot analysis confirmed that each of the selected clones was efficiently glycosylated (Fig. 3B).

Fig. 3.
Fig. 3.

Construction and characterization of bacterial Im7 neoglycoprotein library. (A) Primary sequence and predicted secondary structure for E. coli Im7 immunity protein. Asterisks denote location and frequency of glycosite hits isolated using SSGM. Predicted structures adapted from Protein Data Bank (PDB) ID code 1AYI. (B) Immunoblot analysis of supernatant fractions from CLM24 cells carrying plasmids encoding YebF-Im7 fusions with sequon mutations at indicated position and requisite N-glycosylation machinery. Blots were probed with anti-polyhistidine antibody (α-His) to detect acceptor protein (Top) and hR6 serum against the glycan (Bottom). Markers for aglycosylated (g0) and singly glycosylated (g1) forms of acceptor proteins are indicated at the right. Molecular weight (MW) markers are indicated at the left. Asterisk indicates construct with mutation that introduced stop codon just before 6xHis tag, preventing α-His detection. Results are representative of at least three biological replicates. (C) Mapping of cell-based glycosylation efficiency onto three-dimensional structure of Im7 in complex with ColE7 (Left). Heat-map analysis of the glycosylation efficiency was determined based on densitometric quantification of the percent glycosylated (defined as g1/[g0+g1] ratio) for each acceptor protein in the anti-His immunoblot. Detailed interactions between ColE7 and Im7, highlighting side chains of Im7 in the regions of α1-loop12-α2 (residues 19 to 39; Middle) and loop23-α3-loop34 (residues 46 to 63; Right). Heat-map analysis of change in binding activity was determined by normalizing activity measured for glycosylated sequon variant by aglycosylated counterpart. (D) Binding activity of glycosylated (gray bars) and aglycosylated (white bars) YebF-Im7 variants recovered from supernatants was measured by ELISA with ColE7 as immobilized antigen. All data were normalized to binding activity measured for aglycosylated YebF-Im7 lacking a sequon (wt), such that values greater than 1 (denoted by dashed red line) indicate enhanced binding activity relative to wt Im7. Dashed boxes correspond to two regions (Region 1: residues 23 to 33; Region 2: residues 58 to 69) that have many variants with increased activity. Data are average of three biological replicates and error bars represent SD of the mean. (E) DSF analysis of the 15 most active YebF-Im7 variants with and without glycosylation. Tm calculated as midpoint of thermal transition between native and unfolded states. Dashed line indicates Tm for wt YebF-Im7 (38.6 ± 1.0 °C). Black bars are average of three independent replicates with error bars reported as SEM. Dashed red lines in D and E indicate the activity and Tm for wt YebF-RNase A. Statistical analysis was performed on all data in D and E using two-way ANOVA with significance indicated as follows: *P < 0.1; **P < 0.01; ***P < 0.001; ****P < 0.0001; unmarked, not significant.

To exhaustively explore glycosylation sequence space, we constructed all possible individual Im7 sequon variants (80 in total) using the multiplex PCR primer pairs to introduce DQNAT sequons at every position of the protein. A strikingly large number (78 out of 80) of these variants were found to be glycosylated, many with an efficiency that was at or near 100% as estimated from densitometry of the anti-His blot (Fig. 3C and SI Appendix, Fig. 3). Because glycosylation by CjPglB can occur both before and after protein folding is completed (SI Appendix, Fig. 4) (34, 35), the secondary and tertiary structure around a glycosylation site is likely to have a direct effect on the extent to which a given site is occupied. Indeed, it has been observed that sequons located in structurally defined regions of folded acceptor proteins are poorly glycosylated and that partial unfolding is required to increase glycosylation efficiency at these sites (35, 36). To determine if the structural context for any of the Im7 sequon variants was a determinant for the timing and efficiency of glycosylation, we performed in vitro, cell-free glycosylation reactions in which already folded but yet-to-be glycosylated YebF-Im7 proteins derived from culture supernatants were incubated with purified CjPglB and glycan donor. Remarkably, there was near-perfect agreement between the cell-free and cell-based glycosylation results, with nearly all of the purified Im7 variants undergoing highly efficient glycosylation that was at or near 100% with few exceptions (SI Appendix, Fig. 3). The observation that so many Im7 variants were efficiently glycosylated in vitro by the CjPglB enzyme (i.e., after folding had been completed) indicates that each sequon was located in either a structurally compliant position (e.g., flexible and surface-exposed loops) within the folded protein or in a region of the protein that became partially unfolded during the cell-free glycosylation reaction. While broad accessibility is certainly plausible given the small size and simple topology of Im7, we cannot rule out the contribution of conformational destabilizing effects caused by substitution of five-residue stretches of native amino acids in the protein. Regardless of the exact reason, these results indicate that Im7 was extremely tolerant to both cell-based and cell-free installation of N-glycans over its entire structure.

Structural and Functional Consequences of Im7 Glycosylation.

To exhaustively determine the effect of glycan attachment on neoglycoprotein properties, we first quantified binding activity of all 80 Im7 sequon variants with and without glycosylation by subjecting each to multiwell enzyme-linked immunosorbent assay (ELISA) using purified ColE7 as immobilized antigen. Native Im7 interacts with ColE7, a 60-kDa bacterial toxin that is cytotoxic in the absence of the cognate Im7 inhibitor (37). With an eye toward multiplexibility, we chose to assay YebF-Im7 fusions directly because 1) it obviated the need for molecular reformatting of the expression constructs, 2) the fusions could be isolated as relatively pure species from cell-free supernatants, bypassing the need for extensive purification, and 3) the introduction of the small YebF domain had no measurable effect on ColE7-binding activity (SI Appendix, Fig. 5A). Whereas nearly two-thirds of the YebF-Im7 fusions were either unaffected by glycosylation or rendered inactive by introduction of the DQNAT motif alone, particularly in a contiguous stretch between residues 50 to 57 of Im7, the remaining one-third exhibited significantly altered binding activity that was attributable to the presence of the N-glycan (Fig. 3D). These glycosylation-induced effects were clearly dependent on the precise location of the modification. Indeed, some of the most striking increases in binding activity for glycosylated variants over their aglycosylated counterparts were observed to occur at the transition between different types of secondary structure (e.g., Im7N33, Im7N58, and Im7N65). These results were particularly noteworthy in light of the elevated probability of finding naturally occurring sequons in locations where secondary structure changes (33).

Among the Im7 neoglycoproteins whose activity was most significantly affected both positively and negatively by N-glycosylation the majority were located in two distinct regions covering residues 23 to 33 and 58 to 69 (Fig. 3D). These regions occurred within the two arms of Im7 (one located in α1–loop12–α2 from residues 19 to 39 and the other in loop23–α3–loop34 from residues 46 to 63) that interact extensively with a continuous region in ColE7 in the crystal structure (Fig. 3C) (27). The two interfaces are charge-complementary, and charge interactions are largely responsible for the tight and specific binding between the two proteins; hence, it was not surprising that binding activity was sensitive to N-glycan attachment in the vicinity of these interfaces. It should be pointed out that the presence of an N-glycan in some of these positions was uniquely modulatory, as substitution of DQNAT alone in these same locations generally had little effect on activity, as evidenced by the comparable ColE7 binding measured for aglycosylated Im7 variants versus wt Im7 (SI Appendix, Fig. 5B).

To determine whether any of the glycosylation-induced increases in binding activity were related to stabilization of the native fold, the most active Im7 neoglycoproteins were subjected to differential scanning fluorimetry (DSF) with SYPRO Orange dye in a real-time PCR instrument. Previous studies showed that melting temperature (Tm) values obtained by DSF correlated well with those determined by circular dichroism (CD) thermal denaturation (38). Here, too, we observed excellent agreement between these two methods, which both yielded Tm values for wt Im7 (∼39 °C; SI Appendix, Fig. 5 C and D) that agreed with a previously reported value (37). Importantly, the presence of the small YebF domain did not significantly alter the Tm value for Im7 (SI Appendix, Fig. 5D), consistent with its lack of effect on ColE7-binding activity. We also confirmed that DSF results obtained using YebF-Im7 derived directly from cell-free supernatants were indistinguishable from those obtained with more extensively purified YebF-Im7 (SI Appendix, Fig. 5D). Using DSF, the average Tm values for glycosylated and aglycosylated versions of each Im7 variant were measured, and the change in unfolding temperature, ΔTm, was calculated such that a positive ΔTm signified an increase in structural order and a reduced conformational flexibility due to appending a glycan. Several of the variants exhibited positive ΔTm values, with the largest increases corresponding to glycan installation at N33, N59, N60, N65, and N80 (Fig. 3E). Conversely, glycans at N10, N58, and N64 caused the largest decreases in Tm, indicative of glycan-induced protein structural changes that destabilized the protein.

SSGM of an Acceptor Protein with More Complex Topology.

We next turned our attention to bovine RNase A. Like Im7, RNase A has been intensely studied from a structure–function standpoint and has been pivotal to understanding many aspects of enzymology, biological chemistry, and protein folding and stability. We chose RNase A because 1) it is a relatively small protein containing 124 residues but with a more complex topology than Im7, with all major types of secondary structure, namely α-helices, β-sheets, and turns, represented; 2) the natively glycosylated form of RNase A, namely RNase B, contains a single N-linked oligosaccharide at N34 and a crystal structure is available (39); 3) glycosylation at N34 has no apparent effect on the secondary or tertiary structure (39) but does appear to alter the thermal stability (40), although this is controversial (41); and 4) RNase A modified with an optimal bacterial sequon at the native N34 glycosylation site (RNase AN34) can be glycosylated by CjPglB in both cell-based and cell-free reactions (34, 35). For these reasons, RNase A represented an ideal target for SSGM.

Extracellular secretion of glycosylated YebF-RNase AN34 was observed in colony blots and immunoblots (Fig. 2C), confirming the compatibility of RNase A with glycoSNAP screening. An SSGM library was created by subjecting YebF-RNase A plasmid DNA to the multiplex inverse PCR method, resulting in sequence coverage of 93% in the preselected library as determined by next-generation sequencing (SI Appendix, Fig. 2). CLM24 cells carrying plasmids encoding the requisite C. jejuni glycosylation machinery were transformed with the SSGM library and subjected to glycoSNAP screening. A total of ∼100 glycosylation-positive colonies were randomly selected from two membranes and subjected to sequencing analysis. Of these, only 50 were nonredundant as many of the sequences were isolated multiple times (e.g., seven times each for RNase AN41 and RNase AN122; Fig. 4A). The sequons of these positive hits were uniformly distributed throughout the primary sequence and found in every type of secondary structural element, akin to the results with Im7. Immunoblot analysis confirmed that all selected clones were glycosylated, and the efficiency for most was at or near 100% as estimated by densitometry analysis of the anti-His blots (Fig. 4B and SI Appendix, Fig. 6 A and B). We also performed theoretical analysis of each of these RNase A glycosite variants in terms of glycosylation probability using NetNGlyc1.0 (, a web-based tool that predicts N-glycosylation sites in human proteins using artificial neural networks that examine the sequence context of N-X-S/T sequons (40). Interestingly, a total of 18 glycosites, which were predominantly clustered in the C-terminal half of the protein, had a glycosylation probability score below 50% (SI Appendix, Fig. 6C) and thus would be predicted to be inefficiently glycosylated, if at all. RNase AN111 and RNase AN122, in particular, both scored below 30% and yet were both very efficiently glycosylated in cells (and in vitro, as discussed below).

Fig. 4.
Fig. 4.

Construction and characterization of RNase A neoglycoprotein libraries. (A) Primary sequence and predicted secondary structure for bovine pancreatic RNase A. Asterisks denote location and frequency of glycosite hits isolated using SSGM. Predicted structures adapted from PDB ID code 1RBX. (B) Mapping of cell-based (Left) and cell-free (Right) glycosylation efficiency onto three-dimensional structure of RNase A. Heat-map analysis of glycosylation efficiency was determined based on densitometric quantification of percent glycosylated (defined as g1/[g0+g1] ratio) for each neoglycoprotein in anti-His immunoblot. (C) Enzymatic activity of glycosylated (gray bars) and aglycosylated (white bars) RNase A variants recovered from culture supernatants. All data were normalized to binding activity measured for aglycosylated YebF-RNase A lacking a sequon (wt). Data are average of three biological replicates and error bars represent SD of the mean. (D) DSF analysis of YebF-RNase A variants with and without glycosylation. Tm was calculated as midpoint of thermal transition between native and unfolded states. Dashed line indicates Tm for wt YebF-RNase A (59.0 ± 0.1 °C). Black bars are average of three independent replicates with error bars reported as SEM. Dashed red lines in C and D indicate the activity and Tm for wt YebF-RNase A. Statistical analysis was performed on all data in C and D using two-way ANOVA with significance indicated as follows: *P < 0.1; **P < 0.01; ***P < 0.001; ****P < 0.0001; unmarked, not significant.

To investigate whether the structural context of the sequon impacted the possible timing of PglB-mediated glycan installation, we performed cell-free glycosylation of folded RNase A variants. While some variants were glycosylated equally well in cell-based and cell-free reactions (e.g., RNase AN46 and RNase AN64), an unexpectedly large number showed significantly lower levels of glycosylation under cell-free conditions (Fig. 4B and SI Appendix, Fig. 6 A and B). Most notably among these were variants N34, N35, N36, N43, N51, N61, N69, N72, N80, N89, and N104, which were all efficiently glycosylated in cells but underwent little or no detectable glycosylation in vitro. These sequons occur at locations that were likely to be accessible to the OST during translation/translocation when the proteins are unfolded but became inaccessible after the protein completed folding. Indeed, the native N-glycosylation site at N34 is located in a structured domain, suggesting that the poor cell-free glycosylation at this specific location (and perhaps also at the nearby N36 and N43 sites) was due to sequon inaccessibility in the folded state. Such folding-dependent recognition of this site has been observed previously (34, 35) and, together with the results presented here, supports a model whereby cell-based glycosylation of these particular sequons involves glycan installation prior to folding, either co- or posttranslocationally (SI Appendix, Fig. 4).

To determine the consequences of glycosylation at the 50 unique sites, the ability of glycosylated and aglycosylated versions of each sequon variant to catalyze the hydrolysis of the phosphodiester bonds in RNA was evaluated. While the addition of YebF had little to no effect on RNase A activity (SI Appendix, Fig. 7a), more than half of the RNase A variants were inactivated by substitution of the DQNAT sequon (Fig. 4C). To determine if this might be due to the substitution of five residues in the target protein, a requirement for optimal recognition by CjPglB (42), we mutated RNase A more conservatively at a select number of sites. Specifically, we generated minimal sequons (D-X-N-X-T/S or X-X-N-X-T/S, where X represents the native amino acid), which in most cases required only one or two amino acid changes. Each of these mutants was completely inactive except for RNase AN55 with a DVNAT sequon, which retained some activity but was still significantly less active than the wt enzyme (SI Appendix, Fig. 7B). Hence, even relatively minor sequence perturbations at these positions, in addition to the less subtle substitution with DQNAT, were all capable of inactivating RNase A. More careful inspection revealed that the majority of variants with little to no activity corresponded to the substitution of sequons in locations that would be predicted to disrupt catalytically important residues or disulfide bonds (Fig. 4C and SI Appendix, Supplementary Results).

Among the RNase A neoglycoproteins that retained function, only eight (sequons at N34, N35, N36, N51, N53, N61, N89, and N104) showed activity that was on par (>50%) with wt RNase A, but none were more active than their aglycosylated counterpart (Fig. 4C). In the case of RNase AN119, introduction of the DQNAT sequence completely abrogated catalytic activity, consistent with previous findings that the relative activity of an H119N mutant was reduced to less than 1% of wt RNase A, with catalytic efficiency reduced by 100- to 1,000-fold depending on the substrate used (43). Despite the importance of this residue for catalysis, glycosylation at this position partially restored enzymatic activity, indicating an N-glycan-dependent gain of function.

To determine whether glycosylation impacted stability, we again used DSF to analyze the most active RNase A neoglycoproteins along with RNase AN93, which was randomly chosen as a representative inactive variant. The measured Tm values for wt YebF-RNase A and its unfused counterpart were both ∼59 °C (SI Appendix, Fig. 7C), in close agreement with previous findings (41), the Tm values for all the YebF-RNase A variants spanned a range from 58 to 63 °C (Fig. 4D). Most exhibited positive ΔTm values compared to their aglycosylated counterpart, including the RNase AN119 variant, suggesting that the restoration of activity caused by glycan attachment at N119 also served to stabilize the protein. In contrast, RNase AN89 and RNase AN93 exhibited large negative ΔTm values that coincided with slightly weakened activity due to glycan attachment in the case of N89 and complete inactivation in the case of N93.

Investigation of IgG Variable Domain Glycosylation Using SSGM.

We next investigated antibody variable domain glycosylation, a phenomenon that is observed for ∼15% of serum IgGs and contributes to diversification of the B cell antibody repertoire (6). Although glycan installation within the variable domains of Fab arms has been long known, the rules governing the selection of N-glycosylation sites in Fab domains that emerge during somatic hypermutation and the functional consequences of the attached glycans remain poorly understood. To systematically investigate this phenomenon using SSGM, the two variable domains, VH and VL, from the human anti-HER2 monoclonal antibody were joined by a flexible linker to form scFv-HER2 that was subsequently modified at its N terminus with YebF and at its C terminus with a DQNAT motif. Extracellular secretion of glycosylated YebF-scFv-HER2DQNAT was observed in colony blots and immunoblots (Fig. 5A), confirming the compatibility of scFv-HER2 with glycoSNAP screening. Because variable domain glycosylation is subject to selection mechanisms that depend on the nature of the antigen (6), we modified the SSGM strategy to enable dual screening of glycosylation and antigen-binding activity by labeling colonies with SBA lectin and the extracellular domain (residues 1 to 652) of human HER2 (HER2-ED), which was avidly bound by scFv-HER2DQNAT fused to YebF (SI Appendix, Fig. 8A). In this way, two-color screening could be used to identify colonies that were positive both for glycosylation and for antigen binding, as demonstrated with the YebF-scFv-HER2DQNAT construct (Fig. 5A). Next, we constructed and screened an SSGM library, after which two-color glycoSNAP screening was performed with CLM24 cells carrying plasmids encoding the library and the C. jejuni glycosylation machinery. A total of ∼60 dual-positive hits were isolated from membranes, of which 21 were determined to be nonredundant (e.g., N58 in VL and N42 in VH were each isolated 12 times) (Fig. 5B) and subsequently confirmed for extent of glycosylation by immunoblot and densitometry analysis (SI Appendix, Fig. 8 B and C). The sequons of these hits were sparsely distributed throughout the primary sequence, with a large proportion clustering just after the second and third complementarity-determining regions (CDRs) of the VL domain and also in the flexible linker, indicating a clear selection bias for specific sites that tolerated glycosylation without interfering with binding function. Interestingly, a few of the identified sequons occurred in CDR2 of the VL domain and CDR1 and CDR2 of the VH domain, consistent with naturally occurring IgG repertoires in which N-glycosites are found preferentially in the CDRs (6).

Fig. 5.
Fig. 5.

Construction and characterization of scFv-HER2 neoglycoprotein libraries. (A) Immunoblot analysis of acceptor proteins in colony secretions (Left and Middle) and periplasmic fractions (Right) derived from E. coli CLM24 carrying plasmids encoding scFv-HER2DQNAT and requisite N-glycosylation machinery with either wt CjPglB or an inactive mutant (mut). Blots were probed with anti-polyhistidine antibody (α-His) to detect acceptor protein, SBA, or hR6 serum to detect the glycan, and HER2-ED to detect antibody binding. Bottom color panels depict overlay of α-His and SBA blots or SBA and HER2 blots (merge). Arrows denote aglycosylated (g0) and singly glycosylated (g1) forms of scFv-HER2DQNAT. Molecular weight (MW) markers are indicated at the left. Results are representative of at least three biological replicates. (B) Frequency and position of N-glycosylation sites in scFv-HER2DQNAT glycovariants isolated using SSGM. (C) Binding activity of glycosylated (gray bars) and aglycosylated (white bars) scFv-HER2DQNAT variants as measured by ELISA with HER2-ED as immobilized antigen. All data were normalized to binding activity measured for aglycosylated scFv-HER2 lacking a sequon (wt), such that values greater than 1 (denoted by dashed line) indicate enhanced binding activity relative to wt scFv-HER2. Data are average of three biological replicates and error bars represent SD of the mean. (D) DSF analysis of YebF-scFv-HER2 variants with and without glycosylation. Tm was calculated as midpoint of thermal transition between native and unfolded states. Dashed line indicates Tm for wt YebF-scFv-HER2 (68.2 ± 0.1 °C). Black bars are average of three independent replicates with error bars reported as SEM. Dashed red lines in C and D indicate the activity and Tm for wt YebF-RNase A. Statistical analysis was performed on all data in C and D using two-way ANOVA with significance indicated as follows: *P < 0.1; **P < 0.01; ***P < 0.001; ****P < 0.0001; unmarked, not significant.

In terms of function, all 21 scFv-HER2 hits exhibited HER2-ED binding activity above background (Fig. 5C), which was expected given that the screening process was adapted to include antigen binding. Importantly, nine of these neoglycoproteins (N58, N64, and N109 in VL; N3, N4, N9, N10 in linker; N42 and N113 in VH) exhibited increased binding compared to their aglycosylated counterpart, and most of these were also more active than the parental scFv-HER2. For the five clones exhibiting the greatest increase in activity due to glycosylation, we measured Tm values and found that in general glycan attachment did not affect stability (Fig. 5D). However, the one exception was N64 VL, which experienced a 2.6 °C increase in Tm due to the addition of the N-glycan. Overall, these results are in agreement with several previous studies showing that variable region glycans contribute to antibody binding characteristics and stability in a manner that depends on the precise location of the glycan (6, 44) and suggest that glycosylation in this region may be a useful strategy for fine-tuning the performance of IgG antibodies and their engineered derivatives.

Computational Analysis of Neoglycoproteins.

To test whether protein-structure analyses could explain the observed effects of sequon substitution and glycosylation, we modeled the sequon-substituted variants, with and without glycosylation, and calculated simple geometric measures (secondary structure, burial, distance to the binding site, and surface area) as well as Rosetta energy estimates (stability and interface score) for each. Unfortunately, for the Im7 or RNase A neoglycoproteins, none of these factors was found to correlate significantly with the activity or stability (SI Appendix, Figs. 9–13 and SI Appendix, Supplementary Results). It should be noted that these metrics may be less useful for RNase A because the activities are primarily explained by the disruption of the active site and the disulfide bonds, which are not captured in these metrics.

For Im7, we examined the structures of five selected glycosylated variants for which experimental N-glycan attachment caused different effects and for which the Rosetta total score changes correlated with affinity (colored points in SI Appendix, Fig. 10F). For the case where the glycosylation caused no effect on activity (Im7N46), the carbohydrate conformational ensemble displayed no interaction with ColE7 protein, while for the case where glycosylation reduced activity (Im7N31), the carbohydrate conformational ensemble was broad and disordered because the glycan clashed with ColE7 residues (high Rosetta energies) (Fig. 6A). Then, for the three cases where glycosylation improved activity (Im7N30, Im7N49, and Im7N58), the carbohydrate conformational ensembles made favorable interactions with ColE7 (Fig. 6B).

Fig. 6.
Fig. 6.

Computational analysis of neoglycovariants. (A and B) Ensembles of Im7 glycosylated mutants 46, 31, 30, 49, and 58 (agly/ai = 1.04, 0.55, 1.36, 1.91, and 3.44, respectively) showing low-energy conformations of conjugated glycans. N-glycans are shown as lines (oxygens, red; nitrogens, blue; carbons in different colors for each model). Im7 and ColE7 side chains that interact with the N-glycan are shown as sticks. The structure of scFv-HER2 VL (red) and VH (blue) domains in complex with HER2 protein (gray) is shown at the bottom left. Regression analyses of log activity ratio (glycosylated/wt) versus (C) burial of sequon substitution site (as approximated by the number of Cβ atoms within 8.5 Å of each residue), (D) distance of closest HER2 residue from the sequon substitution site, and (E) total Rosetta score. In all three panels, the dark red lines are the respective regression lines. Colors of dots in E show the respective secondary structure of the sequon substitution site. Orange, green, and blue correspond to α-helix, β-strand, and loop regions, respectively. N58 VL (red circle) has the highest glycosylated binding activity increase and is discussed in the text. (F) Wt representation of sites used for analysis of sequon substitution (36 VL, 108 VL, and 113 VH) and glycosylation (58 VL). Side-chain colors reflect their respective secondary structures. (G) Glycan arrangement (orange sticks) from eight low-energy conformations of glycosylated N58 VL variant of scFv-HER2, revealing possible glycan-HER2 interaction responsible for binding activity improvement.

We also compared the experimental binding activity for scFv-HER2 with multiple geometric and Rosetta metrics. Unlike Im7 or RNase A, scFv-HER2 activity generally correlated with many of our metrics. First, sequon burial reduces the binding affinity of scFv-HER2 for its antigen both in the glycosylated (R2 = 0.43) and aglycosylated (R2 = 0.21) states (Fig. 6C and SI Appendix, Fig. 9B, respectively). Similarly, the closer the sequon was to the paratope, the greater the likelihood of reduced activity for the glycosylated (R2 = 0.23) and aglycosylated (R2 = 0.20) variants (Fig. 6D and SI Appendix, Fig. 9C, respectively). The buried surface area also correlated with the activity of the glycosylated variant (R2 = 0.19; SI Appendix, Fig. 10E). The strongest predictors, however, were the Rosetta scores. For the glycosylated state, the activity correlated with both the total Rosetta score (R2 = 0.49; Fig. 6E) and the interface score (R2 = 0.63; SI Appendix, Fig. 10G). The aglycosylated antibody–antigen complex total score correlated with experimental binding activity (R2 = 0.49; SI Appendix, Fig. 9F). These Rosetta scores were primarily driven by the van der Waals complementarity and to a lesser extent electrostatics (SI Appendix, Figs. 11 and 12).

For the aglycosylated activities, we selected three variants for deeper analysis: two variants that had low binding activity and a poor Rosetta score (N36 VL, N113 VH; black circles in SI Appendix, Fig. 11A) and one variant with high activity and a favorable Rosetta score (N108 VL; red circle in SI Appendix, Fig. 11A). Both N36 VL and N113 VH sites are situated on β-strands in compact regions of the anti-HER2 antibody on the side opposite the antigen-binding site (Fig. 6F, green sticks). The reduced stability arises from the steric clash of substituting a sequon inside (or near) a close-packed region of the protein [Rosetta terms for steric clashes (vdW_rep) of 90.2 and 79.8 Rosetta energy units (REU) for the N36 VL and N113 VH, respectively]. When glycosylated, the clashes worsen in the Rosetta models, corresponding to low activity (black circles in SI Appendix, Fig. 11A). On the other hand, site N108 VL is located at the C-terminal end of VH (Fig. 6F, blue sticks). Sequon substitution had a relatively small effect on the electrostatic interactions (−6.2 REU) and a greater effect on the repulsive van der Waals terms (−28.0 REU), indicating that new side chains are acceptable in less compact regions. A similar outcome was reported following substitution mutation of a human monoclonal antibody (45).

To understand how N-glycosylation was able to improve binding activity of scFv-HER2, we selected mutant N58 VL because the aglycosylated variant was 26% more active than the wt scFv-HER2 and glycan addition improves the binding an additional 1.8-fold. Residue N58 VL resides in the turn between strands 1 and 2 (Fig. 6F, blue backbone). From Rosetta-generated glycosylated structures, the low-energy states showed interfacial contacts between the glycan and the surface residues of HER2 (Fig. 6G), improving both the total Rosetta score and the interface score (red circle in Fig. 6E and SI Appendix, Fig. 10G) and explaining the binding activity improvement as resulting from favorable glycan–antigen contacts.


In this study, we developed a protein engineering workflow called SSGM for constructing large neoglycoprotein libraries of virtually any POI and characterizing the consequences of glycan installation. The utility and flexibility of this technique was demonstrated using three structurally and functionally diverse acceptor proteins: bacterial Im7, bovine RNase A, and human scFv-HER2. Specifically, each of these proteins was subjected to a systematic “sequon walking” procedure that enabled creation of synthetic gene libraries in which N-glycosylation sites (the majority of which were naïve) were introduced at every possible position of the POI. Upon screening these libraries using glycoSNAP (23), numerous positions in each protein were found to be efficiently N-glycosylated. While extended regions and loops tended to be more receptive to glycosylation, all types of secondary structure were found to be glycosylated, consistent with the observation that naturally occurring N-glycans also exist on all forms of secondary structure (33). For RNase A, in particular, a significant number of the efficiently glycosylated sites (18/50) were predicted to have very low glycosylation potential, highlighting the need for large-scale experimental studies of glycosylation, such as described here, that can be used to help refine predictive tools. To this end, higher-throughput techniques that leverage mass spectrometry for quantitatively resolving glycosylation efficiency (46, 47) could enable further refinement of the method in the future.

The studies performed here also provided insight on the possible timing and impact of glycosylation with respect to the folding process. For instance, Im7 tolerated a glycan at almost every position, even when the target asparagine side chain pointed inward and was considered buried (e.g., positions N7, N68, and N76). Because these buried positions physically cannot be glycosylated by PglB when the target protein is in the folded state, they must either be glycosylated cotranslationally or during a fluctuation to a partially unfolded state that provides access to that site. Then, after glycosylation, because Im7 presumably cannot fold back into the native structure, it must adopt a different conformation to accommodate the newly added glycan, which would be feasible in light of the fact that Im7 is very flexible (48). In the case of RNase A, several sites were identified (e.g., N34 and N36) that could be efficiently glycosylated in cells but underwent little to no glycosylation in vitro (in the already folded state), providing clear evidence for glycan installation prior to folding and in a manner that may resemble the cotranslocational process in mammalian cells (49). The overall less-efficient glycosylation seen for many RNase A variants was also consistent with the protein adopting a more stable folded structure compared to Im7 and providing less accessibility to buried sites.

In addition to uncovering glycosylatable sites, the SSGM workflow also allowed the effects of these site-directed glycan “mutations” to be probed for their contribution to the biological and biophysical properties of each POI. In this way, SSGM is conceptually analogous to combinatorial alanine-scanning mutagenesis, which allows systematic determination of the importance of individual amino acids to protein structure and function (5052). Consistent with the known modulatory effects of N-glycans (4, 5), many of the neoglycoprotein derivatives of Im7, RNase A, and scFv-HER2 exhibited detectably altered stability and activity that resulted from covalent attachment of N-glycans at precise locations in the protein backbone. For example, installing N-glycans in the center of α-helices negatively affected activity (e.g., positions 19, 42, and 72 in Im7), whereas those installed at the transition between different types of secondary structure and at turns between motifs promoted enhanced activity and, in some cases, stability (e.g., positions 33, 49, 58, 59, 60, 61, 65, 67, 68, 69, 78, and 80 in Im7). These findings generally agreed with the folding and stability effects contributed by attachment of a GlcNAc2 disaccharide to discrete locations in Im7 (22) and also provide clues for why natural N-glycosylation sites occur with elevated frequency in turns and bends and especially at points of change in secondary structure and with low frequency within ordered helices (33). Despite the overall agreement with previous studies, a few notable differences emerged. For example, in our hands, Im7 glycosylated at position 27 with the GalNAc5(Glc)GlcNAc heptasaccharide was more active but equally stable as its aglycosylated counterpart, whereas an EPL-derived Im7 modified with chitobiose at residue 27 was significantly more stable than unmodified Im7 (note that activity data were not reported) (22). Likewise, RNase AN34 glycosylated with GalNAc5(Glc)GlcNAc exhibited activity that was nearly identical to that of aglycosylated RNase AN34 (and wt RNase A), whereas the attachment of oligomannose glycans at N34 was previously observed to reduce activity by more than threefold (53). The notion that discrete glycan structures attached to the same site in a protein can have disparate effects is not unprecedented, having been documented for other glycoproteins (54, 55). Thus, in the future, it will be of interest to extend SSGM for use with alternative glycan structures, including for example Man3GlcNAc2 or other human-like N- and O-linked glycans that have been engineered in E. coli (34, 56, 57), so that the consequences of varying glycan structures at discrete locations can be systematically investigated. Moreover, by integrating human-like glycan structures with the glycoSNAP screening tool, we anticipate that it will become possible to engineer improved versions of glycoprotein therapeutics using the SSGM workflow described here.

The fact that N-glycan attachment significantly increased the binding activity of several glycosite variants of Im7 and scFv-HER2 suggests that SSGM may become a useful tool for adding N-glycans to naïve sites in proteins for tuning their biological and biophysical properties. While it has been known that glycans can improve binding affinity, to our knowledge the mechanism of making peripheral interface contacts has not been proposed. The discovery of such sites was accelerated by the ability of SSGM to furnish an unprecedentedly large number of intact neoglycoproteins (a total of 151 in this study alone), for which the effects of N-glycan installation can be readily cataloged using multiplexable assays for protein structure and activity as we showed here. While no definitive rules regarding the effects of glycosylation were revealed here, we anticipate that sequon walking on a larger, even proteome-wide, scale could provide access to datasets that might allow the effects of glycosylation to be more widely generalized and perhaps even predicted. Nonetheless, computational analysis indicated that interactions between the glycan and the bound protein can alter binding activity (positively or negatively) and that enhanced binding likely arises from low-energy glycan conformations making favorable interactions with the binding partner. For example, the Im7N58 variant that underwent the largest increase in binding activity upon glycosylation also acquired new contacts with its binding partner, E7, through the glycan, which strengthened binding activity 3.5-fold. Likewise, for the scFv-HER2 mutant N58 VL, which exhibited measurably higher antigen-binding activity compared to parental scFv-HER2, the heptameric glycan created new contacts between scFv-HER2 and HER2-ED and buried more surface area upon binding. Thus, even though part of the enhanced binding was from the sequon substitution alone, perhaps from the additional contacts of the long Q57 side chain or from a stabilizing effect of the sequon on the CDR L2 loop (residues 51 to 57 in VL), most of the effect was from the N-glycan itself. Importantly, this observation was in line with previous findings that glycans attached near (but not within) the antigen-binding site can increase affinity (58). Taken together, our findings suggest that SSGM could be used to rapidly identify naïve sites along a protein backbone for strategic placement of N-glycans that substantially enhance the biological and/or biophysical properties of the resulting neoglycoprotein.

Materials and Methods

Strains and Culture Conditions.

E. coli strain DH5α was used for all molecular biology, including plasmid construction, site-directed mutagenesis, and SSGM library construction. BL21(DE3) was used to purify ColE7 that was used to measure Im7 binding activity in ELISA format. All glycosylation studies were performed using E. coli strain CLM24 (59), which was initially grown at 37 °C in Luria–Bertani (LB) medium containing appropriate antibiotics at the following concentrations: 20 μg/mL chloramphenicol (Cm), 100 μg/mL trimethoprim (Tmp), and 50 μg/mL spectinomycin (Spec). When cells reached midlog phase, protein expression was induced by adding 0.1 mM isopropyl-β-d-thiogalactoside (IPTG) and 0.2% (vol/vol) l-arabinose, after which cells were grown at 30 °C for 16 to 20 h. For all plasmid constructions, see SI Appendix, Supplementary Methods.

SSGM Library Construction.

SSGM mutagenesis libraries were constructed by multiplex inverse PCR (32) followed by T4 ligation. Each of the pTrc99S-YebF-POI plasmids was used as template for PCR amplification using primer sets specifically designed such that the DNA sequence 5′-GAT CAG AAT GCG ACC-3′ was included in the 5′ end of every forward primer to enable substitution of the adjacent five amino acids with DQNAT. Prior to PCR, the forward primers were phosphorylated using T4 polynucleotide kinase (New England Biolabs) to facilitate T4 ligation later. PCR reactions were performed using Phusion polymerase (New England Biolabs), and the PCR products were gel-purified from the product mixtures to eliminate nonspecific PCR products. The resulting PCR products were self-assembled using T4 ligase (New England Biolabs) to obtain the desired SSGM plasmid libraries, which were subsequently used to transform highly competent DH5α cells and then isolated using a QIAprep Spin Miniprep Kit (Qiagen) according to manufacturer’s instructions. For next-generation sequencing, see SI Appendix, Supplementary Methods.

GlycoSNAP Assay.

Screening of SSGM libraries was performed using glycoSNAP as described previously (23). Briefly, E. coli strain CLM24 carrying pMW07-pglΔB and pMAF10 was transformed with corresponding SSGM library plasmids, and the resulting transformants were grown on 150-mm LB-agar plates containing 20 μg/mL Cm, 100 μg/mL Tmp, and 50 μg/mL Spec overnight at 37 °C. The second day, nitrocellulose transfer membranes were cut to fit 150-mm plates and prewet with sterile phosphate-buffered saline (PBS) before placement onto LB-agar plates containing 20 μg/mL Cm, 100 μg/mL Tmp, 50 μg/mL Spec, 0.1 mM IPTG, and 0.2% (wt/vol) l-arabinose. Library transformants were replicated onto 142-mm nitrocellulose membrane filters (Whatman, 0.45 µm), which were then placed colony-side-up on transfer membranes and incubated at 30 °C for 16 h. The nitrocellulose transfer membranes were washed in Tris-buffered saline (TBS) for 10 min, blocked in 5% bovine serum albumin for 30 min, and probed for 1 h with fluorescein-labeled SBA (Vector Laboratories, FL-1011) and Alexa Fluor 647 (AF647)-conjugated anti-His antibody (R&D Systems, IC0501R) or HER2-ED (R&D Systems, 10126-ER) that was conjugated with Alexa Fluor 647 (AF647) (Thermo Fisher Scientific, A37573) following the manufacturer’s instructions. All positive hits were restreaked onto fresh LB-agar plates containing 20 μg/mL Cm, 100 μg/mL Tmp, and 50 μg/mL Spec and grown overnight at 37 °C. Individual colonies were grown in liquid culture and subjected to DNA sequencing to confirm the location of glycosites and to protein glycosylation analysis as described below.

Protein Isolation and Analysis.

For all methods related to protein isolation, immunoblot analysis, protein activity assays, MS analysis, and ELISA, see SI Appendix, Supplementary Methods.

Cell-Free Glycosylation.

Methods for purification of C. jejuni PglB and isolation of LLOs from glycoengineered E. coli were described previously (59). In vitro, cell-free glycosylation was carried out in 30-μL reactions containing either 20 μL of supernatant fraction containing aglycosylated YebF-Im7 or 20 μL of periplasmic fraction containing YebF-RNase A, 2 μg of purified CjPglB, and 5 μg extracted LLOs in cell-free glycosylation buffer [10 mM Hepes, pH 7.5, 10 mM MnCl2, and 0.1% (wt/vol) n-dodecyl-β-d-maltoside]. Reaction mixtures were incubated at 30 °C for 16 h and stopped by adding 10 µL of 4× Laemmli sample buffer containing 5% β-mercaptoethanol followed by boiling at 100 °C for 15 min, after which they were subjected to Western blot analysis.


Binding activity for Im7 and scFv-HER2 was determined by standard ELISA. Briefly, Costar 96-well ELISA plates (Corning) were coated overnight at 4 °C with 50 μL of 5 μg/mL purified ColE7 in 0.05 M sodium carbonate buffer (pH 9.6) for Im7 variants and 50 μL of 0.2 μg/mL HER2-ED (Sino Biological, 10004-HCCH) in PBS buffer for scFv-HER2 variants. After blocking with 5% (wt/vol) nonfat milk in PBS for 1 h at room temperature, the plates were washed three times with PBST (PBS and 0.05% [vol/vol] Tween-20) and incubated with serially diluted aglycosylated and glycosylated YebF-Im7 and YebF-scFv-HER2 glycovariants for 1 h at room temperature. After washing three times with PBST, 50 μL of 1:2,500-diluted HRP-conjugated anti-DDDK tag antibody (Abcam, ab49763) for Im7 variants or 50 μL of 1:5,000-diluted HRP-conjugated anti-6xHis tag antibody (Abcam, ab1187) for scFv-HER2 variants, both in 1% PBST, was added to each well for 1 h. Plates were washed three times and then developed using 50 µL 1-Step Ultra TMB-ELISA substrate solution (Thermo Fisher).

RNase A Activity Assay.

The enzymatic activity of RNase A variants was assayed using RNaseAlert-1 Kit (Integrated DNA Technologies) according to the manufacturer’s protocol. Each of the 80-times-diluted supernatant samples was normalized to have an optical density at 600 nm equivalent to the positive control strain expressing wt RNase A. Samples were then mixed with 20 pmol of RNase A substrate and 10 µL of 10× RNaseAlert Buffer and incubated in RNase-free black 96-well microplates (Fisher) at 37 °C for 30 min. Fluorescence values were measured at 490 nm/520 nm excitation/emission wavelengths.

Thermal Stability Analysis.

Far-ultraviolet (UV) CD spectroscopy of purified Im7 (50 mM sodium phosphate and 400 mM sodium sulfate, pH 7.4) as a function of temperature was carried out in a 0.1-cm cuvette on a spectropolarimeter. Far-UV CD spectra were acquired between 200 nm and 260 nm with a step resolution of 1 nm. Melting temperatures of purified glycovariants were determined using high-throughput DSF as previously described (60). Briefly, 5 to 10 µg of proteins were mixed with Protein Thermal Shift Buffer and Protein Thermal Shift Dye purchased as Protein Thermal Shift Dye Kit (Thermo Fischer Scientific) according to the manufacturer’s instructions. A melting curve was generated by monitoring fluorescence at 465 nm/610 nm excitation/emission wavelengths while increasing temperature from 10 °C to 90 °C at a rate of 0.06 °C/s on an Applied Biosystem ViiA 7 instrument (Life Technologies). To calculate Tm values, the collected data were analyzed by nonlinear regression analysis using the Boltzmann equation in Prism 8.4.2 (GraphPad).

Computational Analyses.

For all computational analyses including protein structure preparation, geometric calculations, and Rosetta protocols, see SI Appendix, Supplementary Methods.

Read more here: Source link