Using NovaFold AI to Streamline Structure Prediction with AlphaFold 2 | DNASTAR

NovaFold AI prediction software from DNASTAR is designed to streamline the entire AlphaFold 2 workflow, with no need to buy a specialized computer. In addition, both the prediction setup and downstream analysis take only a few minutes of hands-on time.

Before showing the simple step-by-step workflow, we want to explain how NovaFold AI fits into the DNASTAR Lasergene software line.

Lasergene software is offered in three separate packages that can be licensed separately or in combination. The flagship application of Lasergene Protein is Protean 3D, easy-to-use standalone software that is also used to set up predictions and view results from each of our four separately- licensed protein prediction applications known as the Nova Applications (Figure 2). These applications provide access to powerful prediction algorithms, including AlphaFold 2, I-TASSER, and SwarmDock.

The workflow described in this chapter uses NovaFold AI and Protean 3D.

Figure 2. Protean 3D is a standalone application that is also used to set up Nova Application predictions and analyze their results.

Using a simple guided workflow, it takes only a few minutes to install Lasergene and less than a minute to set up and begin running each protein structure prediction. Results can be viewed and analyzed in the same application where you do the prediction setup, typically within a few minutes or hours.

In NovaFold AI, setting up the prediction and analyzing the results both take place on a standard Windows or Mac computer, eliminating the need for a specialized computer. Little disk space is required for these tasks, as the entire 2.5 TB AlphaFold 2 library is stored online with NovaCloud, using SSD-backed file share storage. The prediction itself also takes place on the cloud. NovaCloud has a dedicated GPU, which is important during the AI inference and energy minimization phases of the prediction.

Table 2. Requirement comparison for open source AlphaFold 2 vs. NovaFold AI

Use case: Predicting the structure of a chimeric FliCFliS fusion protein with NovaFold AI

Fusion proteins are artificial constructs that are commonly used to explore how different protein fragments interact with one another at the atomic level. Because they are not naturally-occurring proteins, they are never included in the public AlphaFold 2 structure database described in Chapter 2 of this guide.

Fusion proteins are difficult to model with template-based structure prediction algorithms since there are no templates available that combine elements of two diverse structures. But unlike other algorithms, AlphaFold 2 can recognize folds from two different sources and combine them into a single distance matrix, allowing it to predict the composite structure.

However, we have described how open source AlphaFold 2 can be cumbersome and expensive to run due to specialized computer requirements. In this example, we’ll show how easy it is to model a fusion protein with this algorithm using NovaFold AI on a standard laptop.

In the following steps, we predict the structure of the chimeric FliC-FliS fusion protein (PDB ID: 4IWB) that features two pieces of bacterial flagella fused together in a novel way. The structure was solved by x-ray diffraction and added to the PDB database on 1/23/2013.

Step 1: Obtain the protein sequence

To obtain a FASTA sequence for this protein, we went to the 4IWB entry at the PDB website (Figure 2), clicked on the blue Download Files button and chose FASTA Sequence.

Figure 2. Part of the 4IWB entry at the Protein Data Bank website.

Step 2. Set up and run the prediction using NovaFold AI

From the Protean 3D Welcome screen (Figure 3), we clicked Structure Prediction and then New protein structure with NovaFold AI.

Figure 3. The Protean 3D Welcome screen showing the launch point for the NovaFold AI workflow.

The NovaFold AI wizard opened at the Sequences screen. We used the Add File button to upload the FASTA file (Figure 4).

Figure 4. The NovaFold AI Sequences screen after adding the FASTA file.

Clicking Next > took us to a screen where we could customize prediction options.

To ensure fairness in this example, we want to ensure that the 4IWB structure cannot be used as a template by the AlphaFold 2 algorithm. We therefore selected a template cutoff date of 1/22/2013, the day before the structure was submitted to PDB (Figure 5).

Figure 5. The NovaFold AI Options screen with the template cutoff date entered.

Finally, we clicked Submit to begin the prediction. This prediction took about 1.5 minutes to set up and 41 minutes to complete.

Step 3. Analyze the predicted structure

Once the structure was predicted, an active link appeared in the Predictions view (Figure 6).

Figure 6. The link for the most recent prediction appears at the top of the Protean 3D Predictions view.

We clicked this link to open the predicted structure in the Structure view (top left of Figure 7). Initially, the top 5 models are shown overlaid, with the Model Report (bottom left) showing their Local Distance Difference Test* (LDDT) scores.

Figure 7. The top 5 prediction models overlaid in Protean 3D.

Comparing the predicted model to the PDB structure

How did the Model 1 structure prediction compare to the structure that was determined using x-ray crystallography? To find out, we used Protean 3D to align the prediction with the PDB structure using the Structure > Align Structures > Structure Alignment command. We used the Style panel to color the NovaFold AI prediction orange and the PDB (x-ray diffraction) structure blue (Figure 8). Rotating the structure in the Structure view showed that the predicted and known structures were nearly indistinguishable when viewed from any angle.

Figure 8. The prediction made by NovaFold AI (orange) aligned with the x-ray diffraction-determined
structure from PDB (blue).

Using a more objective measurement of accuracy, we can look at the root mean square deviation* (RSMD) values for the alignment between the top five predicted models and the known structure. There is no absolute rule for determining a match for aligned proteins, but a value under 2.0 Å (angstrom, 10-10 m) is generally considered to signify a very close match. In this example, the RMSD values for the five alignments were all under 0.7, with the top-ranked model having an RMSD of just 0.379 Å (see the Details panel in Figure 7 on the previous page).

* RSMD is measure of the average distance between the atoms (usually the backbone atoms) of superimposed proteins.


Read more here: Source link