Answer: alphafold online availability and use case

1. There is no need for heavy-duty methods such as AlphaFold2 (AF2) in all cases. It is very unlikely that you have 1500 sequences that only have domains of unknown function, and even if you do, there were successful structure prediction servers in existence before AF2. So even though what you want could be done, it most likely is an overkill. Don’t know if you are aware, but for most Pfam domains structures are already available on Pfam web site, and for many sequences within each group. For example, [**here**][1] are AF2 models for one of my favorite domains.

2. Tough to generalize. For domains that are completely independent folding units, it would be enough to feed only its sequence. I have done this for a very small domain ( < 50 residues) and AF2 folds it identically when submitted independently or as a part of the whole sequence. However, for domains that make some contacts with the rest of protein, full sequence may be required. I think it is safe to start just with a domain sequence, or at most include 5-10 residues on each side. The latter suggestion is not because AF2 may need it, but because domain assignments are often off. Beware that this way you may end up with floppy tails at the each end of your sequence.

3. None that I know. There is a Docker container, but it required more disk space than what they advertise, and downloading all PDB structures using their scripts took more than 2 days. Their [**calculation**][2] is 2.2 Tb initially to download everyth …

Source link