NEWS FROM THE UPPSALA SOFTWARE FACTORY - 6
Making the most of your search model
Gerard J. Kleywegt
Department of Molecular Biology
Biomedical Centre, Uppsala University
Uppsala - Sweden
In cases with high sequence homology, Molecular Replacement
(MR) is usually a trivial exercise. But, as the structural
homology gets lower and lower, solving the structure
by means of MR becomes increasingly difficult. Conventional
wisdom holds that an RMSD of about 1.5Å between
the search model and the actual structure constitutes
the limit at which MR can still be used. Fortunately,
the MR programs get better, and crystallographers think
of new and clever tricks, thereby pushing the limits
of the method further and further. On the program
end, Axel Brünger has introduced PC-refinement
into X-PLOR [1], which is a useful tool to sort the
rotation function boys from the rotation function men,
and to improve the orientation of the entire molecule
or of individual domains prior to the translation function.
More recently, he has introduced a direct rotation
function [2] which may help in cases where other approaches
fail. Liang Tong has developed a locked rotation function
[3], which uses knowledge of the rotational relationships
between NCS-related molecules to increase the signal-to-noise
ratio of the rotation function. More recently, he
has also implemented a locked translation function
[4]. Jorge Navaza, finally, has carefully redone the
mathematics of the fast rotation and translation functions,
and his package AMoRe [5] has solved numerous problems
which could not be solved by other programs.
In this article we will address some of the tricks that
the crystallographer can use in order to produce a
more suitable search model in cases where the standard
approach is not immediately successfull. We shall
also discuss some software tools that can be of help
in this process.
* Which parts of the model are likely to be conserved
?
MR is based on a comparison of the set of intra-molecular
vectors (rotation function) and inter-molecular vectors
(translation function) calculated from the search model
on the one hand, and the Patterson function (or peaks)
calculated from the data pertaining to the unknown
structure on the other. Therefore, every atom (or
rather: atom pair) in the search model that is roughly
in the correct position will contribute to the signal,
whereas every atom in a wrong position will, at best,
contribute to the noise and, at worst, contribute to
false signals. If the MR is not immediately successfull,
a critical appraisal of which parts of the search model
are likely to be conserved is in order. The crystallographer
can use a large set of heuristics and tools to aid
in this process:
- a sequence alignment of the known
and unknown protein (and other related proteins) is
very valuable. For instance, we know that insertions
and deletions in the sequence are most likely to occur
in loops, and not in the middle of a regular secondary
structure element. Also, in the hydrophobic core of
a protein, sidechain torsion angles are often conserved
between related residues. For instance, in case of
a Phe/Tyr substitution it's often a good guess to assume
that the chi1 and chi2 angles will be conserved (in other
words, this sidechain need not be cut back to the CB
atom).
- assessing the quality of the search model may provide
further clues. If it is the result of a low-resolution
structure determination, a quick analysis of temperature
factors and the Ramachandran plot may reveal regions
which are of poor quality. Any such regions are probably
best left out of the search model.
- in general, the overall fold and the hydrophobic core
are much better conserved than, say, the conformation
of loops or of sidechains that point into the solvent.
Parts of the model which are not likely to be conserved
at all can be removed; if only the sidechain conformation
is uncertain, residues can be cut back to the CB or
CG/OG/SG atom. Temperature factors can either be set
to a uniform value, or they can be retained. The latter
is probably to be preferred in most cases, since it
automatically downweights the contribution of regions
with high temperature factors. An exception can be
made if inspection of the temperature factors shows
them to be completely unreliable (there are structures
in the PDB which have been refined without temperature-factor
restraints at resolutions as low as 3Å [6], the
result being amusing to some, but not particularly
useful). Another option is to multiply all temperature
factors by a constant in order to reproduce the overall
B-factor obtained from a Wilson plot, while retaining
the pattern of low and high temperature factors within
the molecule.
* When the going gets tough ...
In the case of non-crystallographic symmetry (NCS),
MR becomes more difficult because a monomeric search
model constitutes less and less of the scattering matter
in the asymmetric unit as there are more and more molecules
in it. If one is lucky enough to have a tetrameric
search model, and the data indicates that the unknown
might form similar tetramers, the MR can of course
be carried out with the intact tetramer. If the rotation
function fails to give solutions, PC refinement of
the individual monomers may be of help. If that fails
too, a dimer can be used and only if that fails to
give a solution as well should a single monomer be
tried. If the search model is monomeric, but the unknown
is not, the locked rotation and translation functions
may be of use.
In the case of multiple-domain structures, MR calculations
can be carried out with the intact molecule, with different
subsets of the domains, and with individual domains.
PC refinement may be necessary in these cases before
attempting to solve the translation function.
Often nowadays different crystal forms of a protein
are obtained. Clearly, it is worthwhile to try and
solve each of these separately. Usually, the one with
the best data and the smallest degree of NCS can be
solved first. Since the starting phases after MR are
often very poor, it is worthwhile to solve one or more
of the other crystal forms as well, so that multiple-crystal
electron-density averaging can be used to improve the
maps.
* Multiple models
One "trick" which we have found to work very
well in cases where all attempts to solve the structure
had failed is to use multiple, superimposed search
models (see also [9]). Such models are often available:
- there may be several related protein structures available
(we have used this to solve the structure of cellular
retinoic-acid-binding protein [7], for example);
- there may be multiple copies of a single protein structure
available (for example, when the structure was determined
in different labs and/or in different spacegroups,
or when the structure contained NCS, or when different
complexes have been solved);
- there may be a "family" of NMR structures,
each of which individually may be too poor to crack
the problem. We have used this to solve several structures
(for instance acetyl-CoA-binding protein, ACBP [8][6])
which could not be solved by using any of the individual
models.
This approach works surprisingly often, probably since
it implicitly weights the well-conserved parts of the
models higher than the more variable (or more poorly
determined) parts. Nevertheless, it is often necessary
in practice to combine the use of multiple models with
the editing out of parts which show a large conformational
spread. The approach that has worked for us on several
occasions is to superimpose all available models, remove
any obvious outliers (in particular for NMR ensembles),
to remove regions of large variability, and to cut
back all sidechains to CB or CG. Also, we have always
used uniform temperature factors since (a) their use
would be tatutological for multiple models, (b) Bs
from different X-ray structures are not well comparable,
and (c) NMR models don't have Bs associated with them
at all.
When using multiple models, the contrast between correct
solutions and incorrect ones is usually very low, so
that it is very important to ascertain the correctness
of solutions. Also, the correct solutions may be far
down the list: for ACBP, rotation function solution
41, and translation function solution 21 turned out
to be the correct ones ...
It is rather puzzling why it should be so difficult
to solve MR problems when a 100% homologous NMR structure
is available. In the case of ACBP the problem turned
out to be a rather large RMSD of ~2.2Å on CA
atoms. The differences, however, were not random:
the helices had all undergone rigid-body translations
along their axis compared to the NMR model.
* Tools
Various programs from Uppsala can be used in the process
of producing search models:
(1) O [10]:
- to superimpose multiple models
- to delete zones of residues
- to apply conservative substitutions
- to do homology modelling (if one is really desperate;
we know of no case where a homology model has been
used successfully to solve an MR problem that could
not be solved otherwise)
(2) MOLEMAN2 [11], a trivial but useful MOLEcule MANipulating
program:
- to do all sorts of temperature-factor manipulation
- to create poly-Gly, poly-Ala or poly-Ser models
- to calculate the distribution of intra-molecular distances
(this can be used to find a suitable Patterson integration
radius; often ~75% of the largest diameter of the molecule
covers 90% of the vectors)
(3) LSQMAN [6], a superpositioning and (NCS-) analysis
program:
- to superimpose multiple models
- to find the "central" model and outliers
- to find regions of large variability (e.g., in terms
of torsion angles, which is often more informative
than the use of RMSD values [6])
(4) SEAMAN [11], a program specifically written for
SEArch-model MANipulation which handles multiple models
as well as single models:
- to delete loops, turns, general zones, atoms with
high temperature factors
- to apply some minimalist substitutions with sidechain-torsion
angle conservation
- to change zones to poly-Gly, poly-Ala or poly-Ser
Other Uppsala tools that may be of use for MR work in
general include:
- MOLEMAN2: to apply rotation and translation function
solutions to a model
- XPLO2D [11]: to generate plots of 1D and 2D translation
functions from X-PLOR
- XPLO2D: to generate 2D contour plots from X-PLOR direct
rotation function searches [2]
- MAPMAN [12]: to convert CCP4 map and X-PLOR "3dmatrix"
files from rotation and translation function calculations
into O-style maps for viewing on the graphics (can
also be used to extract planes or slices from a map
for plotting with O2D [11])
- PACMAN [11]: to do a quick "centre-of-gravity
packing" check for possible solutions
- PACMAN: to estimate how large a fraction of the cell
can possibly accomodate the centre-of-gravity of the
molecule, depending on its radius, the spacegroup and
the cell dimensions (this can be used to find if spacegroup
P213 is more or less likely than P23, for instance)
- RAVE
[13] [14]: programs for electron-density averaging
(single and multiple domain; single and multiple crystal
form) plus tools for improving NCS and inter-crystal
operators, mask manipulation, etc.
- auto_amore.csh: a C-shell script that uses the CCP4
version of AMoRe and PACMAN to do the MR calculations,
saving the user a lot of tedious file editing, and
making it easy to screen dozens (or hundreds) of rotation
and translation function solutions. This script has
recently been modified so that the rotation function
can be carried out either with Fs or with Es.
* AVAILABILITY
All programs other than O are available to academic
users free of charge (from the O ftp server). For
more information about these programs,
contact GJK.
For more information about O, contact T. A. Jones (
alwyn@xray.bmc.uu.se).
* REFERENCES
- [1] Brünger, A.T. (1990). Acta Cryst. A46, 46-57.
- [2] DeLano, W.L. and Brünger, A.T. (1995). Acta
Cryst. D51, 740-748.
- [3] Tong, L., & Rossmann, M.G. (1990). Acta Cryst.
A46, 783-792.
- [4] Tong, L. (1996). Acta Cryst. A52, in the press.
- [5] Navaza, J. (1994). Acta Cryst. A50, 157-163.
- [6] Kleywegt, G.J. (1996). Acta Cryst. D52, in the press.
- [7] Kleywegt, G.J., Bergfors, T., Senn, H., Le Motte,
P., Gsell, B., Shudo, K. & Jones, T.A. (1994).
Structure 2, 1241-1258.
- [8] Zou, J.Y., Kleywegt, G.J., & Jones, T.A. (1996).
To be published.
- [9] Read, R.J. (1990). Acta Cryst. A46, 900-912.
- [10] Jones, T.A., Zou, J.Y., Cowan, S.W. and Kjeldgaard,
M. (1991). Acta Cryst. A47, 110-119.
- [11] Kleywegt, G.J. (1996). Unpublished programs.
- [12] Kleywegt, G.J. & Jones, T.A. (1996). Acta Cryst.
D52, in the press.
- [13] Jones, T.A. (1992). In "Molecular Replacement",
pp. 91-105, CCP4.
- [14] Kleywegt, G.J. and Jones, T.A. (1994). In "From
First Map to Final Model", pp. 59-66, CCP4.
Latest update at 12 February, 1998.