Protein design based on alphafold2

reference : Using AlphaFold for Rapid and Accurate Fixed Backbone Protein Design

author : Wu Weikun

1. Preface :

With alphafold2 The success of breakthrough prediction of protein structure , Academia also began to explore how to use it for high-precision protein sequence design . This article gives a quick interpretation .

2. computing method :

2.1 Sequence initialization

  • The sequence does not start with a random model , Instead, an autoregressive transformer To generate the initial sequence . Probably 1000 strip denovo Sequence ( chart A).
  • take 1000 A sequence is fed to AlphaFold Predict all structures (relaxed, The highest pLDDT Model of ) Make a reservation . Subsequent use TM-align take target Of backbone and de novo The designed sequence structure is compared ( chart B)
  • Will be the highest Tm-score The sequence of the structure is used as the initial parent sequence , And keep only aligned structure motif Partial sequence , No comparison, replace it with alanine ( chart C).

After this treatment , Will predict the right residue fragment Extract it out , It is more conducive to the search of sequence space than random generation .

2.2 iteration end-2-end Design

The core of the design method is through MCMC The algorithm samples the sequence space , Then use AlphaFold Prediction structure , Until a connection to the target structure is generated backbone As similar as possible .

First, we also use distance map loss The calculation method of , To compare the difference between the designed structure and the real structure :

ij It’s each amino acid pair , y Is the real distance distribution feature ,p Is the predicted distance distribution feature .

In the inference process, we also calculate Of each residue pLDDT, And then in 5 A parameter set Take the average above , But do not average the length of the sequence .

This weight Used to set the probability of sampling as a sequence . hypothesis pLDDT High area , Amino acids are stable .

After determining which region of amino acids should be sampled , This site will randomly and equally mutate into the type of other amino acids ( except cys). And when this mutation makes distance map loss When lowering ( When improving the coincidence of predicted structures ), Keep this mutation . Finally, through such iteration 20000 Round mutation ,distogram score convergence .

2.3 Fast AlphaFold inference

For fast iterative search , The author of AlphaFold The standard forecasting process has been modified :

  • Just use a single sequence to predict
  • Template search is disabled
  • Don’t use recycling
  • MSA The maximum sequence is set to 1
  • attention in , Not related to design head Disabled
  • I didn’t want to structure module, Directly from pair-wise representation Calculate the distance distribution

The final effect : In civilian RTX30 Fasten , One iteration is about 5 second ( forecast 100 The length of amino acids )

2.4 Evaluation of design effect

Three structural prediction methods are used to evaluate

  • standards-of-use AlphaFold technological process
  • Use trRosetta
  • Use fragment-based ab initio Of Rosetta Method

3. Design results

The author uses a manually designed Top7 As test set .

In the first stage of sequence design ,af2 Predicted TM-score have only 0.746, After iterative design with the above method , The newly designed sequence and Top7 The similarity is only 27%. Use this sequence af2 Verification time , Overall RMSD Only for 0.736 Å,pLDDT score by 91. While using trRosetta When making predictions ,Cα-RMSD by 2.637 Å,TM-score by 0.679. The third inspection method is ab initio fragment-based The method of prediction , after 15000 After a sample , The best structure Cα-RMSD by 1.279 Å. All prove that , The designed sequence may be the same as the target structure Fold.

Top7 After successful design , The author further attempts to design data that are not in the training set Peak6 (PDB ID 6MRS)、Foldit(PDB ID 6MRR)、Ferredog-Diesel (PDB ID 6NUK). Initial sequence correspondence matching TM-score by 0.596-0.7 Between , After design ,af2 Prediction structure Cα-RMSD Reduce to 1Å within ,pLDDT score > 85. Use ab initio fragment-based The method of prediction Cα-RMSD All less than 3Å. The similarity between the designed sequence and the target template sequence is lower than 30%. Among a variety of structural prediction tools ,trRosetta The structure of the prediction Cα-RMSD more , This may be related to the input MSA Poor quality is related to .

4. Discuss

By using a reduced version of alphafold2 Conduct fix-backbone Design , Essentially, it is based on pLDDTscore Version of mcmc Sequence sampling , Finally, the reliability of the designed sequence is verified by the structure . The concept of energy function is not used in this design method , So speculate AlphaFold Have learned some structural information related to energy .

5. Last :

NO CODE.

This article is from WeChat official account. –
DrugAI(DrugAI)

The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the
yunjia_community@tencent.com
Delete .

Original publication time :
2021-08-28

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

Read more here: Source link