AlphaFold and Phenix

You can use the predicted models from AlphaFold and other prediction
software in Phenix. Using these models can be very helpful in structure
determination because the models can be very accurate over much of their
length and the models come with accuracy estimates that allow removal
of poorly-predicted regions.

To use AlphaFold models in Phenix you can follow this overall procedure:

1. Get an AlphaFold model (or a model from the PDB) for each chain in your
structure. You can use the “AlphaFold in Colab” button in the
Phenix GUI to do this using the Google Colab notebook system.
(See also the documentation for running AlphaFold).

2. Trim the model and break into rigid domains. You can use
phenix.process_predicted_model to do this.

3. Dock your models (cryo-EM) or carry out molecular replacement
(crystallography) to place your models in the right places in your map
or unit cell. You can use
phenix.dock_predicted_model (cryo-EM)
or phenix.phaser (crystallography) to do this.

4. Fill in the missing parts of your models with loop fitting or iterative
model-building. You can do this
with phenix.rebuild_predicted_model
for cryo-EM and phenix.autobuild for crystallography.

5. Refine the rebuilt predicted models that you obtain. You can use
phenix.real_space_refine for cryo-EM models and
phenix.refine for crytstallographic models.

6. Examine your resulting model in detail, using the validation tools
that are part of
phenix.real_space_refine and
phenix.refine
to help you
identify problem areas and using manual model-building tools to fix them.

If your structure has more than one chain, you will need to carry out some
additional steps. For a crystal structure, you’ll want to generate a
processed AlphaFold (or other) model for each chain and supply all of them
to phenix.phaser for molecular replacement, usually all at once.

For a cryo-EM structure, you can work with one chain at a time. You can
use the whole map for each chain, or if you have some idea of what chain
goes where, you can mask out or box the map so that it shows only one
chain and use that as your map. Boxing or masking the map can speed up
the process and improve the result considerably.

For complex structures with many chains or with chains that contain domains
with long linkers, docking can be very complicated and take a long
time. In these cases it may be especially helpful to box or mask the map
if you can do that. If you cannot, you might want to run the docking step
individually with each domain that you get from
phenix.process_predicted_model
and then examine where they went in the map. If the domains seem to correspond
to different molecules, you might want to mask out the part of the density
that corresponds to the molecule you don’t want to fit and re-try. You can
also try running
phenix.dock_in_map with one domain at a time and ask
to find multiple copies; then you can choose the one that matches up with
the other domains you have placed.

For a cryo-EM structure, you can carry out steps 2-4 in one step with
phenix.dock_and_rebuild. This just links the
processing, docking, and rebuilding steps together.

Structure prediction software is now capable of generating models that are
highly accurate over some or all parts of the models. Importantly,
these predictions often come with reliable residue-by-residue estimates of
uncertainty.

Compact domains in these predicted models in which all
the residues have high confidence often will be very accurate over the
entire domains. However, separate domains that each have high confidence
but are connected by lower confidence residues sometimes have relative
positions and orientations that differ between predicted and
experimentally-determined structures.

When using predicted models as a starting point for experimental structure
determination, it can be helpful to:

Remove low-confidence residues entirely

Break up the model into domains and allow the domains to have
different orientations

For a high-confidence predicted model, you might try using the predicted model
as-is first. For most predicted models, you may want to try removing
low-confidence residues, then additionally try breaking the model into
domains and placing the domains one at a time.

An important feature of recent predicted models is that they generally have
very accurate sequence alignment. That means that the assignment of the
sequence to the high-confidence parts of the model is usually correct. This
can make a very big difference in completion of the remainder of the structure
(the parts that were not predicted with high confidence) because you know
exactly what residues go in the gaps. This means that model-building of
the remainder of the structure can often be completed with loop-fitting
tools instead of trying to rebuild everything.

Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein
structure prediction with AlphaFold. Nature 596, 583–589 (2021).
doi.org/10.1038/s41586-021-03819-2

Hiranuma, N., Park, H., Baek, M. et al. Improved protein structure
refinement guided by deep learning based accuracy estimation.
Nat Commun 12, 1340 (2021).
doi.org/10.1038/s41467-021-21511-x

Read more here: Source link