High-throughput “dry and wet” experiments to explore the principles of optimal design of mRNA sequences

Today I share a preprint article Combinatorial optimization of mRNA structure, stability, and translation for RNA-based therapeutic uploaded by Rhiju Das on BioRxiv , to explore the universal rules for achieving mRNA stability and efficient expression.

Barriers to mRNA therapeutics

With rapid R&D capabilities and extensive R&D pipelines, especially in the field of infectious diseases and tumors, mRNA therapeutics has become a gene therapy platform with unlimited prospects that may subvert modern medical technology. Compared with recombinant proteins, mRNA production is synthesized by in vitro transcription, which is more rapid, flexible and low-cost. In the past decade, significant advances in mRNA chemical modification and delivery systems have rapidly promoted the application of this technology to clinical research. On the other hand, the technical barriers faced by mRNA therapeutics are also very obvious. For example, the inherent chemical instability of mRNA , the limitation of translation efficiency , and its own immunogenicity greatly weaken the protective efficiency of mRNA vaccines.

General optimization laws?

The mRANA technology platform can be divided into scaffold systems and delivery systems. Redesigning and optimizing the mRNA backbone system to avoid the loss caused by the process of preparation, transportation, and injection into the body environment can greatly improve the stability and expression efficiency of mRNA. However, the key question is, we do not understand how mRNA sequence and structure affect stability and expression efficiency? Is there a general optimization law? It is generally believed that carrying a more stable secondary structure will improve the stability of the mRNA molecule in solution. However, since the secondary structure will increase the difficulty of the intracellular translation machine to contact the mRNA molecule or the speed of scanning the sequence, it may cause intracellular mRNA. The translation efficiency is reduced, and the intracellular production of the target protein is low. However, this phenomenon has not been confirmed by rigorous experimental data.

We therefore constructed hundreds of full-length mRNA sequences carrying elements of various UTRs and CDSs, and compared stability and expression efficiency in high-throughput assays.

PERSIST-seq technology

In order to find the design principles for designing mRNA sequences for stable and efficient expression, researchers constructed a series of full-length mRNA sequences, including various 5’UTR elements, CDS sequences, and 3’UTR elements, and developed a high-throughput alignment RNA Sequencing technology, called Pooled Evaluation of mRNA in-solution Stability, and In-cell Stability and Translation RNA-seq ( PERSIST-seq) , is used to systematically evaluate UTR sequence, codon usage, and RNA structure for mRNA in human cells. Effects of translation efficiency and mRNA stability (both in-solution stability and intracellular stability ).

A full-length DNA template for in vitro transcription was synthesized, and each template was inserted with 3 additional characteristic sequences: (1) the consensus T7 promoter sequence (2) the 3’UTR barcode sequence (3) the 3’UTR conserved sequence, which was used for High-throughput PCR and RT PCR were performed. With this signature sequence design, researchers can utilize flanking sequences for high-throughput amplification and analysis to identify mRNA sequences. The constructed template sequence library is transcribed in vitro , capped and tailed , transfected into cells, and quantitatively analyzed by barcode sequencing to directly measure translation efficiency and mRNA degradation rate .

Variant sequence design of UTRs

There are a total of 233 mRNA sequences in the library, and various 5’UTRs and 3’UTRs come from the genomes of cells or viruses . The 5’UTR sequences are derived from highly expressed proteins, regulatory elements, structural proteins, previously identified sequences, etc.; there are a total of 22 3’UTR sequences, ranging in length from 60-597nt, which are derived from those that can stabilize RNA structure and improve translation. Sequences identified. In addition, there are some UTR sequences from viruses, such as SARAS-Cov-2, Dengue virus, TMV, TEV, etc.

In order to test the performance of UTRs variant sequences, the main reference sequences consisted of: 5’UTR sequences and 3’UTR sequences of human hemoglobin subunit beta ( hHBB ), one of the most efficient mammalian mRNA sequences for translation ; all Reporter mRNAs The CDS coding region selects the open reading frame of Nanoluc luciferase ( Nluc ). Correspondingly, sequences of non-hHBB UTRs are called UTR variant sequences .

CDSs variant sequence design

To determine the effect of CDS sequence and structure on mRNA stability and translation efficiency, we maximized differences in CDS sequence and structure encoding target proteins. All CDS variant sequences use common hHBB UTRs to ensure cross-comparison.

There are two design methods for the CDS sequence variant library. The first is to invite Eterna massive open laboratory to optimize the CDS sequence ( no specific optimization parameters are set ). The second is to use optimization algorithms, including mature commercial algorithms for optimizing CAI , GC-rich algorithms, Baidu LinearDesign for weighing CAI and MFE , Ribotree Monte Carlo tree ( optimizing AUP ), etc. Through these optimization methods, 121 CDS variant sequences can be generated .

High-throughput

Construction of a 3’UTR-barcoded mRNA Reporter for high-throughput assessment of mRNA performance

Effects of UTRs and CDSs Sequence Variation on mRNA Translation Efficiency

233 mRNA sequences were transfected into cells, protein was expressed, cells were lysed, sucrose density gradient centrifugation was used to separate mRNAs with active translation and expression, and mRNAs without translation and expression were identified by sequencing barcode sequences. It was found that the differences in UTRs sequences caused each mRNA The ribosomal load carried by the molecule is highly variable . The range of ribosomal load changes due to 5’UTR sequence variation was the largest, indicating that 5’UTRs had a greater impact on the translation efficiency of target mRNA sequences than 3’UTRs and CDS sequences .

Supplement : It has been investigated that the effect of secondary structure on translation efficiency depends on the position of the secondary structure in the 5’UTR sequence. If the secondary structure is located between the cap structure and the initiation AUG, it will affect the 43S translation initiation conformer. If the secondary structure is located after the initial AUG, then the secondary structure causes the ribosomal subunit to scan at a slower rate, giving the ribosome more time to recognize the binding AUG, thereby enhancing the translation efficiency. “

Under the condition that the UTR sequence remains unchanged, the variation range of mRAN ribosomal load caused by CDS sequence variation is much smaller than that caused by UTR sequence variation . For CDS sequences encoding the same target protein, the ribosomal load carried by the CDS variant sequence and the characteristic parameters of the sequence itself , such as codon adaptation index ( CAI ), GC content, minimum free energy ( MFE ), signal peptide Additions, mutations of non-synonymous codons, etc., did not show a very clear link .

High-throughput

mRNA intracellular stability—the main parameter for predicting protein production

The total intracellular production of target protein depends not only on translation efficiency, but also on how long the intact molecule can exist after the mRNA enters the cell . By testing the intracellular half-life of 233 mRNA molecules, a very interesting phenomenon was found. The variation of intracellular mRNA stability caused by CDSs and 5’UTR sequence variation is the largest , which is contrary to the researchers’ initial prediction. Variations in the 3’UTR sequence caused the largest range of changes in mRNA stability because the 3’UTR sequence could regulate cellular mRNA degradation. At the same time, the researchers also noticed that mRNAs with higher ribosomal loads had shorter half-lives in cells . The unstable mRNA carrying the 5’UTR variant sequence and the 5’/3’UTR variant sequence has more attached polysomes ( Polysome ). If there is a modest increase in ribosomal load with more ribosomes attached ( monosome ), then mRNA stability also increases.

By analyzing these experimental data, an unexpected rule of mRNA sequence design was found : higher translation efficiency negatively affects mRNA stability . In other words, on the mRNA molecule, there is a mutual constraint between an increase in the polysome load and the maximization of the total production of the target protein over a certain period of time.

In order to study the game between ribosome capacity and stability , they designed a quantitative model to predict the impact of the two on protein production, and the results also confirmed that the total protein production and mRNA stability were correlated. However, for proteins that are highly expressed for a short period of time or whose half-life is higher than that of the protein, translation efficiency is the main predictor of yield. Therefore, according to the required parameters, whether you want high expression in the early stage or long-term protein expression after transfection, the parameters of the corresponding UTR optimization should be changed.

High-throughput

Intracellular mRNA stability is a major factor in kinesin expression yield

Factors affecting the stability of mRNA in solution

Degradation of mRNA in solution is the most important obstacle in the transportation process. The researchers used PERSIST-seq technology to evaluate the stability of mRNA in aqueous solution. Since UTR sequence mainly regulates mRNA stability through intracellular translation machinery or degradation machinery, and these components do not exist in aqueous solution, UTR sequence variation basically does not affect the stability of mRNA in aqueous solution .

The variation of CDS sequence is the biggest factor that causes the stability of mRNA to change in solution. The shorter the CDS sequence , the longer the half-life of mRNA in aqueous solution. The more highly structured CDS sequences , the longer the mRNA half-life in aqueous solution.

Effects of pseudouracil on in-solution mRNA stability

Although chemically modified derivatives of uracil, such as pseudouracil and methyluracil, have been widely used to enhance intracellular mRNA stability, there are no reports on the stability of mRNA in solution by modified nucleotides. The researchers used capillary electrophoresis to test the stability of mRNA in aqueous solution over time, and found that when the uracil in the mRNA sequence was replaced with pseudouracil, the stability of mRNA in aqueous solution was significantly improved .

High-throughput

Pseudouracil substitution in mRNA sequences can improve mRNA stability in a degrading environment

Effects of stacked mRNA optimization strategies on stability and protein yield

Highly structured 5’UTR and 3’UTR can trigger high levels of protein synthesis, and structured CDS sequences can improve mRNA stability in aqueous solution and cellular protein production. Therefore, the researchers wondered whether combining these optimized sequences could lead to the design of highly translated and stable mRNA sequences . The results showed that when the CDS sequence optimized by LinearDesign-1 was matched with the hHBB UTR sequence for 24 hours, the protein yield was significantly increased (2-fold) compared with the original reference sequence. Interestingly, after replacing uracil with pseudouracil, the total protein yield of the original reference sequence was not affected, but the CDS sequence optimized by LinearDesign-1 decreased significantly (2-fold), but compared with other CDS optimization strategies. It is still maintained at a high level.

The researchers constructed mRNAs carrying the same UTR sequence through different CDS optimization algorithms , and compared the stability of these mRNA optimized sequences in solution and the protein yield in cells. The results found that the combined CDS optimized sequences using LinearDesign and DegScore-guided RiboTree design were used. , mRNA stability and protein yield were significantly improved.

High-throughput

Effects of combinatorial CDS-optimized sequence strategy on mRNA stability and expression yield

In addition, the effect of pseudouracil substitution on the stability and cellular protein expression yield in solution of LinearDesign and DegScore-guided RiboTree design combinatorial CDS-optimized sequences was determined . They first placed the original sequence and the uracil replacement sequence in a solution environment where mRNA was prone to degradation, and then at a specific time point, purified and recovered all mRNA, transferred it into the cell, and looked at the protein expression, and found that methyluracil was used. The replaced combinatorial optimized mRNA sequence still has expression activity after being stored in a degraded environment for 2 hours, but does not contain the unoptimized original sequence of uracil, and after 2 hours of storage, no protein expression can be detected in the transferred cells.

High-throughput

Effects of pseudouracil substitutions in combinatorial optimization of mRNA sequences on mRNA stability and protein expression yield

Summarize

Read more here: Source link