reference : Using AlphaFold for Rapid and Accurate Fixed Backbone Protein Design
author : Wu Weikun
1. Preface :
With alphafold2 The success of breakthrough prediction of protein structure , Academia also began to explore how to use it for high-precision protein sequence design . This article gives a quick interpretation .
2. computing method :
2.1 Sequence initialization
- The sequence does not start with a random model , Instead, an autoregressive transformer To generate the initial sequence . Probably 1000 strip denovo Sequence ( chart A).
- take 1000 A sequence is fed to AlphaFold Predict all structures (relaxed, The highest pLDDT Model of ) Make a reservation . Subsequent use TM-align take target Of backbone and de novo The designed sequence structure is compared ( chart B)
- Will be the highest Tm-score The sequence of the structure is used as the initial parent sequence , And keep only aligned structure motif Partial sequence , No comparison, replace it with alanine ( chart C).
After this treatment , Will predict the right residue fragment Extract it out , It is more conducive to the search of sequence space than random generation .
2.2 iteration end-2-end Design
The core of the design method is through MCMC The algorithm samples the sequence space , Then use AlphaFold Prediction structure , Until a connection to the target structure is generated backbone As similar as possible .
First, we also use distance map loss The calculation method of , To compare the difference between the designed structure and the real structure :
ij It’s each amino acid pair , y Is the real distance distribution feature ,p Is the predicted distance distribution feature .
In the inference process, we also calculate Of each residue pLDDT, And then in 5 A parameter set Take the average above , But do not average the length of the sequence .
This weight Used to set the probability of sampling as a sequence . hypothesis pLDDT High area , Amino acids are stable .
After determining which region of amino acids should be sampled , This site will randomly and equally mutate into the type of other amino acids ( except cys). And when this mutation makes distance map loss When lowering ( When improving the coincidence of predicted structures ), Keep this mutation . Finally, through such iteration 20000 Round mutation ,distogram score convergence .
2.3 Fast AlphaFold inference
For fast iterative search , The author of AlphaFold The standard forecasting process has been modified :
- Just use a single sequence to predict
- Template search is disabled
- Don’t use recycling
- MSA The maximum sequence is set to 1
- attention in , Not related to design head Disabled
- I didn’t want to structure module, Directly from pair-wise representation Calculate the distance distribution
The final effect : In civilian RTX30 Fasten , One iteration is about 5 second ( forecast 100 The length of amino acids )
2.4 Evaluation of design effect
Three structural prediction methods are used to evaluate
- standards-of-use AlphaFold technological process
- Use trRosetta
- Use fragment-based ab initio Of Rosetta Method
3. Design results
The author uses a manually designed Top7 As test set .
In the first stage of sequence design ,af2 Predicted TM-score have only 0.746, After iterative design with the above method , The newly designed sequence and Top7 The similarity is only 27%. Use this sequence af2 Verification time , Overall RMSD Only for 0.736 Å,pLDDT score by 91. While using trRosetta When making predictions ,Cα-RMSD by 2.637 Å,TM-score by 0.679. The third inspection method is ab initio fragment-based The method of prediction , after 15000 After a sample , The best structure Cα-RMSD by 1.279 Å. All prove that , The designed sequence may be the same as the target structure Fold.
Top7 After successful design , The author further attempts to design data that are not in the training set Peak6 (PDB ID 6MRS)、Foldit(PDB ID 6MRR)、Ferredog-Diesel (PDB ID 6NUK). Initial sequence correspondence matching TM-score by 0.596-0.7 Between , After design ,af2 Prediction structure Cα-RMSD Reduce to 1Å within ,pLDDT score > 85. Use ab initio fragment-based The method of prediction Cα-RMSD All less than 3Å. The similarity between the designed sequence and the target template sequence is lower than 30%. Among a variety of structural prediction tools ,trRosetta The structure of the prediction Cα-RMSD more , This may be related to the input MSA Poor quality is related to .
4. Discuss
By using a reduced version of alphafold2 Conduct fix-backbone Design , Essentially, it is based on pLDDTscore Version of mcmc Sequence sampling , Finally, the reliability of the designed sequence is verified by the structure . The concept of energy function is not used in this design method , So speculate AlphaFold Have learned some structural information related to energy .
5. Last :
NO CODE.
This article is from WeChat official account. –
DrugAI(DrugAI)
The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the
yunjia_community@tencent.com
Delete .
Original publication time :
2021-08-28
Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .
Read more here: Source link