OOPS-a-daisy
Rebuilding a protein structure into an electron density
map is a tedious chore. Assuming that the chain has
been traced correctly, there still remains the danger
of smaller, local errors in the structure [1], such
as a poor fit of the model to the data and poor stereo-chemistry.
When rebuilding a model, for example with O [2], for
each residue in turn one has to consider if the residue
fits the density, if it has reasonable side-chain geometry,
if it has favourable phi/psi angles, if it has atoms
with unusually high temperature factors, if the peptide
is close to planar and has its oxygen atom pointing
into the right direction, etc. etc.
Collecting and integrating all the necessary information
is time-consuming and prone to oversights, especially
if one has to rebuild several hundred residues. In
addition, once the model is reasonably well refined,
most residues will be okay. This means that, for most
rebuilds, it may be better to spend most of the time
inspecting only a small fraction of the residues, namely
the ten percent (or so) bad or suspect ones. In order
to speed up and facilitate the rebuilding process,
we have written a program, called OOPS, which automates
most of the work involved in managing all quality-related
information. To prevent re-invention of a number of
wheels, OOPS performs only a few error checks itself;
most of the quality indicators it checks should be
calculated with O first. In addition, it is fairly
simple to include one's own criteria. The most interesting
part of the program's output is a set of O macros which
will take the user on a journey along all bad or suspect
residues. For each such residue, the macros will tell
the user what is wrong with the residue. In this fashion,
rebuilding can be sped up by a factor of two to five,
depending on the quality of the current model.
DATABLOCKS
The input to OOPS consists of a number of O datablocks
plus some input from the user. For some checks, a
PDB file is required as well. In addition, output
on a per-residue basis from other programs can be used
(if it comes in the form of an O datablock).
So, what's an O datablock ? O stores all its data in
the internal database in the form of datablocks [3].
These are one-dimensional arrays (vectors) of integer,
real, character (*6) or text values. For example,
the residue names of a molecule are stored in character
datablocks:
M9A_RESIDUE_NAME C 159 (1x,5a) A5 A6 A7 A8 A9 ... A160 A161 A162 A301
O contains several utilities for assessing protein quality on a per-residue basis (see below). For instance, the "RSC_fit" (Rotamer-Side-Chain fit) command calculates for every residue (except Gly and Ala) the RMS distance between its side-chain atoms and the corresponding atoms of the rotamer that is most similar to it [3,4]. If this number is greater than ~1.5 Å, the residue has a side-chain conformation which differs significantly from all of the known rotamers (derived from a database of well-refined, high-resolution structures) for that residue type. This means that the residue merits closer scrutiny [4]: either its conformation is "real", or it is in error (e.g. due to a rebuilding error or, more seriously, over-fitting of the data at low resolution). The results of the calculations are stored in a datablock, one real number per residue:
M9A_RESIDUE_RSC R 159 (9(x,f7.4)) 1.7662 0.8072 0.0000 0.4455 0.8154 0.1031 0.4413 0.0000 0.5567 0.1751 0.3660 0.0000 0.3071 0.4685 1.7608 0.0000 1.0607 0.3184 0.1695 0.3179 0.1571 0.3836 0.0000 0.2108 1.5435 1.5919 0.7658 ... 0.8385 0.0000 0.2209 0.1567 0.8266 1.3797 0.7248 0.1913 0.2502 1.3568 0.4881 1.4166 0.9104 0.0000 0.0000OOPS can handle up to ten user-defined criteria. The only requirements are that the data is in the form of a datablock with one (integer or real) number per residue, and that the distinction between good and bad residues is of the type: "the residue is bad, if the value of X is less (or greater) than some cut-off". Another utility program available to O users (called ODBMAN) can be used to extract O datablocks from the output of other programs such as X-PLOR [5] and PROCHECK [6]. Such datablocks can then be used in conjunction with OOPS. Examples are the number of bad contacts which can be extracted from PROCHECK output, and the "conformational energy" of the residues, which can be calculated with X-PLOR. One could also use OOPS in NMR work, for example by providing it with the number of constraint/restraint violations per residue, or even simply the number of NOEs per residue (if this number is low, the residue is not well defined by the data; this information could then be used to decide to replace "chimeric" side-chain conformations by rotamers).
QUALITY CHECKS
At present, OOPS can check the following quality indicators
(in addition to the ten user-definable criteria; note
that some of the criteria are specific for and limited
to amino-acid residues):
(1) Bad pep-flips [2] (a measure for the distance between a peptide oxygen orientation and those encountered in the database). Typically, residues with values exceeding ~2.5 Å should be inspected more closely (either they are wrong, or they have an unusual peptide orientation for a reason)
(2) Bad RS-fit values [1,2] (the correlation between calculated and 2Fo-Fc density for any or all atoms in a residue); values lower than ~0.6 indicate poor density. RS-fit values may be checked for all atoms, main-chain atoms alone and side-chain atoms alone(3) Bad RSC values (see above)
(4) Mask errors (i.e., if one uses real-space averaging, this checks which atoms are not covered by the current mask, given a certain radius)
(5) Too high and too low temperature factors and occupancies
(6) Bad phi/psi angle combinations [7]
(7) Poor peptide planarity (by calculating the improper twist angle C(i) - Ca(i) - N(i+1) - O(i), which is 0.0 for planar peptide groups)
(8) Poor Ca chirality [6] (by calculating the improper twist angle Ca(i) - N(i) - C(i) - CB(i) which should be ~33.9(o) for non-Gly, non-Pro residues)
(9) Bad QualWat values [8] (water molecules only). This is a combined measure of quality, incorporating occupancy and temperature factor of the water oxygen atoms and the resolution of the data: QualWat = 100 * Q * EXP(-B/(4D^2))). This quantity will be zero for absent, and 100 for perfect water molecules
(10) Bad contacts (requires output from PROCHECK [6])
OUTPUT
The output of OOPS consists of:
(1) statistics for most of the used quality indicators, e.g.:
*************************************************************************** Analysis of Pep-flip values (>0) *************************************************************************** Number of values .................... 154 Average value ....................... 0.837 Standard deviation .................. 0.584 Minimum value observed .............. 0.166 Maximum value observed .............. 3.033 Nr < 0.0000 : 0 ( 0.00 %; Cum 0.00 %) Nr >= 0.0000 and < 0.5000 : 47 ( 30.52 %; Cum 30.52 %) Nr >= 0.5000 and < 1.0000 : 74 ( 48.05 %; Cum 78.57 %) Nr >= 1.0000 and < 1.5000 : 14 ( 9.09 %; Cum 87.66 %) Nr >= 1.5000 and < 2.0000 : 8 ( 5.19 %; Cum 92.86 %) Nr >= 2.0000 and < 2.5000 : 7 ( 4.55 %; Cum 97.40 %) Nr >= 2.5000 and < 3.0000 : 3 ( 1.95 %; Cum 99.35 %) Nr >= 3.0000 and < 3.5000 : 1 ( 0.65 %; Cum 100.00 %) Nr >= 3.5000 : 0 ( 0.00 %; Cum 100.00 %)(2) plot files for some of the criteria (as a function of residue number)
(3) a list of potentially bad residues, plus their faults, e.g.:
OOPS - (GLU A5) Bad RS-fit (all atoms) = 0.4076500 Bad RS-fit (main chain) = 0.5228300 Bad RS-fit (side chain) = 0.2270200 Mask too tight Too high temperature factor = 140.1200 Bad contact(s); count = 1(4) a list of the violation counts for each criterion, e.g.:
Bad pep-flip : ( 4) Bad RS-fit (all atoms) : ( 8) Bad RS-fit (main chain) : ( 8) Bad RS-fit (side chain) : ( 11) Bad RSC : ( 17)(5) a set of O macros. These can be generated in several ways:
centre_zone M6A A5 ; print Residue GLU A5 print Bad RS-fit (all atoms) = 0.408 print Bad RS-fit (main chain) = 0.523 print Bad RS-fit (side chain) = 0.227 print Mask too tight print Too high temperature factor = 140.12 print Bad contact(s); count = 1 @avemap obj sph sphere 10 end bell print Hit or type "@oops/a6" for next baddy menu @oops/a6 on on_off menu @oops/a5 off on_offBy using chained macros for only the suspect residues, the crystallographer is guided quickly through the trouble spots and the interesting bits of the structure. If, on the other hand, one generates a macro for every residue, one can use the macros as a useful source of information. If one wants to obtain information about the quality of residue Trp A412, all one has to do is to execute the OOPS macro (which may have the same name as the residue) pertaining to this residue and up come the bad aspects (if any) of that tryptophan.Using OOPS requires some O datablocks to be prepared in advance (however, there's an O macro available to do most of that work for the user as well). Running the program takes only a few minutes. The result in terms of speed-up of the rebuilding process are well worth this small effort. Also, OOPS makes it less likely that residues with serious errors in them are overlooked and may therefore help improve the quality of the structure.
AVAILABILITY
OOPS is one in a series of "O-dalisques",
i.e. programs that work in conjunction with O. OOPS
runs on SGI, ESV and DEC ALPHA/OSF1 workstations.
For more information, contact GJK (E-mail: "gerard@xray.bmc.uu.se").
REFERENCES
[1] C.I. Brändén & T.A. Jones, Nature
343 (1990), 687-689.
[2] T.A. Jones, J.Y. Zou, S.W. Cowan & M. Kjeldgaard,
Acta Cryst. A47 (1991), 110-119.
[3] T.A. Jones & M. Kjeldgaard, "O - the manual",
Uppsala (1993).
[4] J.Y. Zou & S.L. Mowbray, "An evaluation
of the use of databases in protein structure refinement",
submitted.
[5] A.T. Brünger, "X-PLOR. A system for crystallography
and NMR", New Haven (1992).
[6] R.A. Laskowski, M.W. MacArthur, D.S. Moss &
J.M. Thornton, J. Appl. Cryst. 26 (1993), 283-291.
[7] C. Ramakrishnan & G.N. Ramachandran, Biophys.
J. 5 (1965), 909-933.
[8] E. Arnold & M.G. Rossmann, J. Mol. Biol. 211
(1990), 763-801.