New Algorithms That Harnessed Protein-folding Power in 2022

Big pharma companies have been researching protein folding for a long time now. Discoveries and innovations in the field can revolutionise the development of drug and other biological advancement. Recently, the development of the COVID-19 vaccine was also supported by tackling this issue. 

Protein folding prediction process involves a combination of complex algorithms. Recent models from big tech companies like Meta and Google have made advancements in solving this problem for protein folding and sparked interest in researchers after getting open-sourced.

Here’s a list of some of the prominent protein fold prediction models that are highly accurate and compete with each other in terms of their methods and speed!

AlphaFold 2

Google’s DeepMind made a major breakthrough using a deep learning approach to build AlphaFold, which has a network-based approach for predicting protein structures. In 2018, AlphaFold 1 was highly appreciated at CASP13 for its remarkable innovations and now with AlphaFold 2, DeepMind has increased the speed and accuracy even further.

AlphaFold 2 won the CASP14 in 2020 and is since then regarded as the best protein folding model. 

DeepMind decided to make their model open-source for more contributions and to further innovations. In July, DeepMind collaborated with European Bioinformatics Institute (EMBL-EBI) and released the predicted structures of all catalogued proteins, thereby expanding their earlier database by more than 200X.

Check out the code for AlphaFold here.

ESMFold

Meta AI’s launch of Evolutionary Scale Modeling (ESM), proved to be one of the biggest competitors or the best alternative to AlphaFold 2. Much like AlphaFold, the model is also open to the public. 

ESMFold has excellent accuracy and works on end-to-end atomic level protein structure. It uses ESM-2, which is a transformer-based language model built on 15 billion parameters. Since it is based on a language model, ESMFold stands apart from other protein fold prediction models in that it offers higher accuracy and faster inference.

ESMFold produces precise protein structure even with a single sequence as input as it leverages the internal representations of the language model. When it comes to tests on CASP14, the model received a score of 68 which is lower than that of AlphaFold 2, which received a score of 84. 

To see the code, click here.

RoseTTAFold

Minkyung Baek from the Baker Lab developed a tool to predict protein structures using deep learning called ‘RoseTTAFold’. It is based on a three-track neural network and is interestingly insightful towards protein structure even without a determined structure—making it faster at prediction.  

The three-track network integrates one-dimensional protein structure and processes into two-dimensional sequence information with the distance of amino acids at once. The software allows direct collection of reasons and patterns in the relationship between folded architecture and peptides. 

According to several reports, RoseTTAFold was able to predict tens of hundreds of new protein structures that were unknown before. Scientists and researchers also predict that the software could resolve x-ray crystallography and cryo-electron microscopy modelling problems.

Click here for the GitHub repository.

OmegaFold

In July, Chinese biotech firm, ‘Helixon’, developed OmegaFold and joined the protein fold prediction race—beating its competitors in several areas. After outperforming RoseTTAFold and competing with AlphaFold 2 for its high-resolution protein structure prediction, the developers released the code to the public on GitHub.

The model works on divergent sequences, unlike multiple sequence alignments in AlphaFold and RoseTTAFold, which allows them to make predictions and suggest geometry-inspired transformer models trained on protein structures from single sequences. 

OmegaFold works on the protein language model, OmegaPLM, that can sense structural information encoded in amino-acid sequences. Thus, the model can predict protein structure ten times faster than RoseTTAFold as it can predict structure and folds with a single amino-acid sequence. 

Click here for the repository.

D-I-TASSER

Zhang Lab from the University of Michigan developed Distance-guided Iterative Threading ASSEmbly Refinement, or D-I-TASSER, which is used for high-accuracy protein fold and structure prediction. It is built by integrating threading and deep learning. D-I-TASSER comes after the lab’s older model, ‘I-TASSER’, and provides higher speed and accuracy.

Starting with a query sequence, the generation of inter-residual contact and distance maps is processed using two multiple deep neural network predictors—DeepPotential and Attention Potential. 

The model has an optional additional server called D-I-TASSER-AF2 that incorporates AlphaFold2 restraints and increases general accuracy when compared to both models separately. 

Click here to visit the lab’s website.

IntFOLD

This server provides a unified resource for predicting protein tertiary structures automatically with built-in estimates of model accuracy (EMA). The server is a fully automated, high-performance tool for predicting protein structures from their amino acid sequences.

The server was tested on CASP and performed very well in the blind tests. The results are presented in graphical outputs, which is also beneficial for non-expert users as it provides a visual summary of a complex set of data. 

Click here to read IntFOLD’s research paper.

RaptorX

RaptorX offers a template-based protein secondary structure prediction and modelling. The template-based tertiary structure modelling approach allows the model to finish processing a sequence of 200 amino acids in around 35 minutes.

What sets RaptorX apart from other protein fold prediction models is a novel non-linear scoring function, aligning target sequence with multiple distantly-related template proteins and probabilistic consistency algorithm.

Read more about RaptorX here.

Read more here: Source link