NEWS FROM THE UPPSALA SOFTWARE FACTORY - 6

Making the most of your search model

Gerard J. Kleywegt
Department of Molecular Biology
Biomedical Centre, Uppsala University
Uppsala - Sweden

In cases with high sequence homology, Molecular Replacement (MR) is usually a trivial exercise. But, as the structural homology gets lower and lower, solving the structure by means of MR becomes increasingly difficult. Conventional wisdom holds that an RMSD of about 1.5Å between the search model and the actual structure constitutes the limit at which MR can still be used. Fortunately, the MR programs get better, and crystallographers think of new and clever tricks, thereby pushing the limits of the method further and further. On the program end, Axel Brünger has introduced PC-refinement into X-PLOR [1], which is a useful tool to sort the rotation function boys from the rotation function men, and to improve the orientation of the entire molecule or of individual domains prior to the translation function. More recently, he has introduced a direct rotation function [2] which may help in cases where other approaches fail. Liang Tong has developed a locked rotation function [3], which uses knowledge of the rotational relationships between NCS-related molecules to increase the signal-to-noise ratio of the rotation function. More recently, he has also implemented a locked translation function [4]. Jorge Navaza, finally, has carefully redone the mathematics of the fast rotation and translation functions, and his package AMoRe [5] has solved numerous problems which could not be solved by other programs.
In this article we will address some of the tricks that the crystallographer can use in order to produce a more suitable search model in cases where the standard approach is not immediately successfull. We shall also discuss some software tools that can be of help in this process.

* Which parts of the model are likely to be conserved ?
MR is based on a comparison of the set of intra-molecular vectors (rotation function) and inter-molecular vectors (translation function) calculated from the search model on the one hand, and the Patterson function (or peaks) calculated from the data pertaining to the unknown structure on the other. Therefore, every atom (or rather: atom pair) in the search model that is roughly in the correct position will contribute to the signal, whereas every atom in a wrong position will, at best, contribute to the noise and, at worst, contribute to false signals. If the MR is not immediately successfull, a critical appraisal of which parts of the search model are likely to be conserved is in order. The crystallographer can use a large set of heuristics and tools to aid in this process:

a sequence alignment of the known and unknown protein (and other related proteins) is very valuable. For instance, we know that insertions and deletions in the sequence are most likely to occur in loops, and not in the middle of a regular secondary structure element. Also, in the hydrophobic core of a protein, sidechain torsion angles are often conserved between related residues. For instance, in case of a Phe/Tyr substitution it's often a good guess to assume that the chi1 and chi2 angles will be conserved (in other words, this sidechain need not be cut back to the CB atom).
assessing the quality of the search model may provide further clues. If it is the result of a low-resolution structure determination, a quick analysis of temperature factors and the Ramachandran plot may reveal regions which are of poor quality. Any such regions are probably best left out of the search model.
in general, the overall fold and the hydrophobic core are much better conserved than, say, the conformation of loops or of sidechains that point into the solvent.

Parts of the model which are not likely to be conserved at all can be removed; if only the sidechain conformation is uncertain, residues can be cut back to the CB or CG/OG/SG atom. Temperature factors can either be set to a uniform value, or they can be retained. The latter is probably to be preferred in most cases, since it automatically downweights the contribution of regions with high temperature factors. An exception can be made if inspection of the temperature factors shows them to be completely unreliable (there are structures in the PDB which have been refined without temperature-factor restraints at resolutions as low as 3Å [6], the result being amusing to some, but not particularly useful). Another option is to multiply all temperature factors by a constant in order to reproduce the overall B-factor obtained from a Wilson plot, while retaining the pattern of low and high temperature factors within the molecule.

* When the going gets tough ...
In the case of non-crystallographic symmetry (NCS), MR becomes more difficult because a monomeric search model constitutes less and less of the scattering matter in the asymmetric unit as there are more and more molecules in it. If one is lucky enough to have a tetrameric search model, and the data indicates that the unknown might form similar tetramers, the MR can of course be carried out with the intact tetramer. If the rotation function fails to give solutions, PC refinement of the individual monomers may be of help. If that fails too, a dimer can be used and only if that fails to give a solution as well should a single monomer be tried. If the search model is monomeric, but the unknown is not, the locked rotation and translation functions may be of use.
In the case of multiple-domain structures, MR calculations can be carried out with the intact molecule, with different subsets of the domains, and with individual domains. PC refinement may be necessary in these cases before attempting to solve the translation function.
Often nowadays different crystal forms of a protein are obtained. Clearly, it is worthwhile to try and solve each of these separately. Usually, the one with the best data and the smallest degree of NCS can be solved first. Since the starting phases after MR are often very poor, it is worthwhile to solve one or more of the other crystal forms as well, so that multiple-crystal electron-density averaging can be used to improve the maps.

* Multiple models
One "trick" which we have found to work very well in cases where all attempts to solve the structure had failed is to use multiple, superimposed search models (see also [9]). Such models are often available:

there may be several related protein structures available (we have used this to solve the structure of cellular retinoic-acid-binding protein [7], for example);
there may be multiple copies of a single protein structure available (for example, when the structure was determined in different labs and/or in different spacegroups, or when the structure contained NCS, or when different complexes have been solved);
there may be a "family" of NMR structures, each of which individually may be too poor to crack the problem. We have used this to solve several structures (for instance acetyl-CoA-binding protein, ACBP [8][6]) which could not be solved by using any of the individual models.

This approach works surprisingly often, probably since it implicitly weights the well-conserved parts of the models higher than the more variable (or more poorly determined) parts. Nevertheless, it is often necessary in practice to combine the use of multiple models with the editing out of parts which show a large conformational spread. The approach that has worked for us on several occasions is to superimpose all available models, remove any obvious outliers (in particular for NMR ensembles), to remove regions of large variability, and to cut back all sidechains to CB or CG. Also, we have always used uniform temperature factors since (a) their use would be tatutological for multiple models, (b) Bs from different X-ray structures are not well comparable, and (c) NMR models don't have Bs associated with them at all.
When using multiple models, the contrast between correct solutions and incorrect ones is usually very low, so that it is very important to ascertain the correctness of solutions. Also, the correct solutions may be far down the list: for ACBP, rotation function solution 41, and translation function solution 21 turned out to be the correct ones ...
It is rather puzzling why it should be so difficult to solve MR problems when a 100% homologous NMR structure is available. In the case of ACBP the problem turned out to be a rather large RMSD of ~2.2Å on CA atoms. The differences, however, were not random: the helices had all undergone rigid-body translations along their axis compared to the NMR model.

* Tools
Various programs from Uppsala can be used in the process of producing search models:

(1) O [10]:

to superimpose multiple models
to delete zones of residues
to apply conservative substitutions
to do homology modelling (if one is really desperate; we know of no case where a homology model has been used successfully to solve an MR problem that could not be solved otherwise)

(2) MOLEMAN2 [11], a trivial but useful MOLEcule MANipulating program:

to do all sorts of temperature-factor manipulation
to create poly-Gly, poly-Ala or poly-Ser models
to calculate the distribution of intra-molecular distances (this can be used to find a suitable Patterson integration radius; often ~75% of the largest diameter of the molecule covers 90% of the vectors)

(3) LSQMAN [6], a superpositioning and (NCS-) analysis program:

to superimpose multiple models
to find the "central" model and outliers
to find regions of large variability (e.g., in terms of torsion angles, which is often more informative than the use of RMSD values [6])

(4) SEAMAN [11], a program specifically written for SEArch-model MANipulation which handles multiple models as well as single models:

to delete loops, turns, general zones, atoms with high temperature factors
to apply some minimalist substitutions with sidechain-torsion angle conservation
to change zones to poly-Gly, poly-Ala or poly-Ser

Other Uppsala tools that may be of use for MR work in general include:

MOLEMAN2: to apply rotation and translation function solutions to a model
XPLO2D [11]: to generate plots of 1D and 2D translation functions from X-PLOR
XPLO2D: to generate 2D contour plots from X-PLOR direct rotation function searches [2]
MAPMAN [12]: to convert CCP4 map and X-PLOR "3dmatrix" files from rotation and translation function calculations into O-style maps for viewing on the graphics (can also be used to extract planes or slices from a map for plotting with O2D [11])
PACMAN [11]: to do a quick "centre-of-gravity packing" check for possible solutions
PACMAN: to estimate how large a fraction of the cell can possibly accomodate the centre-of-gravity of the molecule, depending on its radius, the spacegroup and the cell dimensions (this can be used to find if spacegroup P213 is more or less likely than P23, for instance)
RAVE [13] [14]: programs for electron-density averaging (single and multiple domain; single and multiple crystal form) plus tools for improving NCS and inter-crystal operators, mask manipulation, etc.
auto_amore.csh: a C-shell script that uses the CCP4 version of AMoRe and PACMAN to do the MR calculations, saving the user a lot of tedious file editing, and making it easy to screen dozens (or hundreds) of rotation and translation function solutions. This script has recently been modified so that the rotation function can be carried out either with Fs or with Es.

* AVAILABILITY
All programs other than O are available to academic users free of charge (from the O ftp server). For more information about these programs, contact GJK. For more information about O, contact T. A. Jones ( alwyn@xray.bmc.uu.se).

* REFERENCES

[1] Brünger, A.T. (1990). Acta Cryst. A46, 46-57.
[2] DeLano, W.L. and Brünger, A.T. (1995). Acta Cryst. D51, 740-748.
[3] Tong, L., & Rossmann, M.G. (1990). Acta Cryst. A46, 783-792.
[4] Tong, L. (1996). Acta Cryst. A52, in the press.
[5] Navaza, J. (1994). Acta Cryst. A50, 157-163.
[6] Kleywegt, G.J. (1996). Acta Cryst. D52, in the press.
[7] Kleywegt, G.J., Bergfors, T., Senn, H., Le Motte, P., Gsell, B., Shudo, K. & Jones, T.A. (1994). Structure 2, 1241-1258.
[8] Zou, J.Y., Kleywegt, G.J., & Jones, T.A. (1996). To be published.
[9] Read, R.J. (1990). Acta Cryst. A46, 900-912.
[10] Jones, T.A., Zou, J.Y., Cowan, S.W. and Kjeldgaard, M. (1991). Acta Cryst. A47, 110-119.
[11] Kleywegt, G.J. (1996). Unpublished programs.
[12] Kleywegt, G.J. & Jones, T.A. (1996). Acta Cryst. D52, in the press.
[13] Jones, T.A. (1992). In "Molecular Replacement", pp. 91-105, CCP4.
[14] Kleywegt, G.J. and Jones, T.A. (1994). In "From First Map to Final Model", pp. 59-66, CCP4.

Latest update at 12 February, 1998.

NEWS FROM THE UPPSALA SOFTWARE FACTORY - 6

Making the most of your search model

Gerard J. Kleywegt Department of Molecular Biology Biomedical Centre, Uppsala University Uppsala - Sweden

Gerard J. Kleywegt
Department of Molecular Biology
Biomedical Centre, Uppsala University
Uppsala - Sweden