Dictionaries for Heteros
The most frequently-asked question on the X-PLOR and, to a lesser extent, the O-info mailing lists is probably: "Could somebody send me topology/parameter/dictionary files for compound X ?", where compound X is some hetero entity. In general, generating such dictionary files is a cumbersome, time-consuming and (as we have found out more than once) error-prone undertaking. In order to simplify the process, we have collected a large set of hetero-entities from the January 1995 release of the Protein Data Bank (PDB), and implemented some tools which automate much of the dictionary-generation process (and -from our own experience- drastically reduce the number of errors in the resulting dictionaries).
* COLLECTION OF HETERO-COMPOUNDS
In our opinion, the best place to start looking for
coordinates for a hetero-compound is the CSD small-molecule
crystallographic database. If such a search yields
no useful clues, the next-best thing is to check if
anyone else has previously used the compound in a macromolecular
refinement. In that case, the PDB is the most suitable
place to look. In order to make the search as simple
as possible, we have written a script which automatically
scans the PDB and finds all unique hetero-compound
names. This list is subsequently edited to remove
obvious duplicates, and the edited list is fed into
a program. This program scans all PDB entries again,
in order of decreasing resolution. As soon as one
of the hetero-compounds from the list is encountered,
the coordinates are stored. Once the scan is complete,
every hetero-compound is translated to put its centre-of-gravity
at the origin, all occupancies are set to 1.0 and all
temperature factors to 20.0 Å2. The coordinates
and some more information (PDB entry from which it
was taken, resolution, a list of other PDB files which
contain the same compound, etc.) are then written to
one large file. Our present collection (generated
using the January 1995 release of the PDB) contains
more than 700 (mostly unique) hetero-compounds. As
an example, the entry for all-trans-retinoic acid (which
we shall use here throughout) looks as follows:
COMPND RETINOIC ACID REMARK Extracted from PDB file 1fem.pdb REMARK Formula C20 H28 O2 REMARK Nr of non-hydrogen atoms 22 REMARK Residue type REA REMARK Residue name 621 REMARK 2 RESOLUTION. 1.9 ANGSTROMS. 1FEM 36 REMARK Compound also present in : 1EPB 1CBR HETATM 1 C1 REA 621 -3.034 1.835 -2.850 1.00 20.00 1FEM HETATM 2 C2 REA 621 -3.924 1.728 -4.090 1.00 20.00 1FEM ...Using an editor, or a Unix tool as simple as grep, one can quickly find out if the compound one is looking for occurs in the file. If the compound is new to crystallography, one may have to resort to other methods to come up with a set of coordinates (e.g., using quantum-chemical or molecular mechanics calculations, or "mutating" a similar compound).
HETATM 20 C20 REA 621 2.020 -1.190 4.576 1.00 20.00 1FEM HETATM 21 O1 REA 621 6.229 -1.851 4.290 1.00 20.00 1FEM HETATM 22 O2 REA 621 4.389 -0.721 5.672 1.00 20.00 1FEM
* X-PLOR DICTIONARIES
One of our utility programs, XPLO2D, contains an option
which can be used to generate appropriate dictionaries
for X-PLOR. Given a PDB file containing the coordinates
of a hetero-compound, it generates four new files:
* a topology file (defining atom types, masses, etc., bonds, impropers [chiral carbons, flat groups and bonds], possible dihedrals, hydrogen-bond acceptors and possible donors). This file usually needs to be edited, for instance to add charges and the masses of implicit hydrogen atoms. For all-trans-retinoic acid this file looks as follows:
Remarks rea.top
Remarks Created by XPLO2D V. 950419/1.2.3 at Fri May
5 22:24:52 1995 for user
Remarks Auto-generated by XPLO2D from file ./rea.pdb
Remarks You *MUST* check/edit MASSes and CHARges !!!
Remarks Check DONOrs and ACCEptors
Remarks Verify IMPRopers yourself
Remarks UNcomment any DIHEdrals you want to use
set echo=false end
edit masses yourself !!!
MASS CX1 12.01100 ! ADD 1.008 for each H
...
MASS OX10 15.99940 ! ADD 1.008 for each H
autogenerate angles=true end
RESIdue REA
GROUp
ATOM C1 TYPE CX1 CHARge 0.0 END
...
ATOM O2 TYPE OX10 CHARge 0.0 END
BOND C1 C2
...
BOND C15 O2
edit these DIHEdrals if necessary
! DIHEdral C17 C1 C2 C3 ! fixed dihedral ???
94.80
...
! DIHEdral C12 C13 C14 C15 ! fixed dihedral ???
-179.98
edit these IMPRopers if necessary
IMPRoper C1 C2 C6 C16 ! chirality or flatness
improper 49.73
...
IMPRoper C15 C14 O1 O2 ! chirality or flatness
improper 5.01
edit these DONOrs and ACCEptors if necessary
! DONOr H? O1 ! only true if -OHx (x>0)
ACCEptor O1 C15
...
END
* a parameter file (defining target values and force constants for bonds, etc.). The target values are simply the averages of the observed values. The force constants are set to the same value for all bonds, angles and impropers (the defaults being in the same ball-park as those of the Engh & Huber force field). For example:
Remarks rea.par
Remarks Created by XPLO2D V. 950419/1.2.3 at Fri May
5 22:24:52 1995 for user
Remarks Auto-generated by XPLO2D from file ./rea.pdb
Remarks Parameters for residue type REA
set echo=false end
edit if necessary
BOND CX1 CX2 1000.0 1.530 ! Nobs = 1
...
BOND CX7 OX10 1000.0 1.390 ! Nobs = 1
edit if necessary
ANGLe CX2 CX1 CX4 500.0 112.79 ! Nobs =
1
...
ANGLe CX1 CX3 CX2 500.0 54.52 ! Nobs =
1
insert DIHEdrals yourself if necessary
suggested weight = 300.0 for Engh & Huber compatibility
edit if necessary
...
IMPRoper CX7 CX3 OX9 OX10 750.0 0 5.007 !
Nobs = 1
edit if necessary
NONBonded CX1 0.1200 3.7418 0.1000 3.3854
...
NONBonded OX10 0.1591 2.8509 0.1591 2.8509
set echo=true end
* an X-PLOR input file which, when executed, will energy-minimise the structure of the compound, and print a list of violations afterwards. This should always be done prior to inclusion of the compound into the crystallographic refinement process, since the resulting structure reveals what X-PLOR will try to make the compound look like once it is included. If, for instance, a dihedral angle was given a target value of 180(o) whereas it should have been 0(o), this will show up immediately after the energy minimisation. Hence, this is a quick and easy way to prevent the frustration of finding that X-PLOR has "ruined" your compound after a 4,000 K slow-cool which took two weeks to run ... The input file may look as follows:
Remarks rea_min.inp
Remarks Created by XPLO2D V. 950419/1.2.3 at Fri May
5 22:24:53 1995 for user
Remarks Auto-generated by XPLO2D from file ./rea.pdb
Remarks Energy-minimisation input file for residue type
REA
topology
@rea.top
end
parameters
@rea.par
nbonds
atom cdie shift eps=8.0 e14fac=0.4
cutnb=7.5 ctonnb=6.0 ctofnb=6.5
nbxmod=5 vswitch
end
end
segment name=1FEM
chain
coordinates @./rea.pdb_clean
end
end
coordinates @./rea.pdb_clean
minimise powell
nstep=250 drop=40.0
end
write coordinates output=rea_min.pdb end
vector ident (store9) (not hydrogen)
constraints interaction (store9) (store9) end
print threshold=0.02 bonds
print threshold=3.0 angles
print threshold=10.0 dihedrals
print threshold=3.0 impropers
stop
* a "clean" PDB file suitable for use by X-PLOR in the energy-minimisation procedure.
* O DICTIONARIES
Once a set of coordinates has been obtained for a hetero-compound,
it can be read into O and moved into place with the
Move_zone command. Subsequently, the Tor_general command
can often be used to adjust some of the free torsion
angles. Finally, the RSR_rigid command can be invoked
to optimise the fit of the compound to the density
with real-space rigid-body refinement.
Another one of our utility programs (MOLEMAN) can be
used to generate four of the five types of dictionary
file that may be needed for the display and manipulation
of a hetero-compound inside O. The only dictionary
that cannot be generated in this fashion is that required
for regularisation. On the other hand, regularisation
can be done rapidly in X-PLOR, and if one uses sensible
manipulation commands in O (e.g., Move_zone, Tor_residue,
RSR_rigid, but not Move_atom) it will rarely be necessary
to regularise a hetero-compound. The four types of
dictionary that can be generated automatically involve:
* Connectivity. In order for O to draw the correct bonds (e.g., no bonds between hydrogen atoms), a connectivity entry is sometimes needed (although in most cases the defaults in O will do the job). For retinoic acid, such an (automatically generated) entry would look as follows:
REA
ATOM C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
ATOM C11 C12 C13 C14 C15 C16 C17 C18 C19 C20
ATOM O1 O2
CONNECT - C1 C2 C3 C4 C5 C6 C1 C16 C2 +
CONNECT C1 C17
CONNECT C5 C18
CONNECT C6 C7 C8 C9 C10 C11 C12 C13 C14 C15
O1
CONNECT C9 C19
CONNECT C13 C20
CONNECT C15 O2
* Real-space fit. To include a compound in real-space fit calculations, a list of all its atoms can be provided as an O datablock. For instance:
rsfit_REA T 2 70
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11
C12 C13 C14
C15 C16 C17 C18 C19 C20 O1 O2
* Real-space refinement. A similar datablock is needed to include the compound in some real-space refinement calculations (RSR_zone):
RSR_dict_REA T 2 70
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11
C12 C13 C14
C15 C16 C17 C18 C19 C20 O1 O2
* Torsions. Some torsion-angle manipulations in O require that the angles and the affected atoms are defined. For instance:
RESIDUE REA
TORSION TOR1 -108. C1 C6 C7 C8 C8 C9 C10 C11 C12 C13
C14 C15 C19 C20 \
O1 O2
TORSION TOR2 -108. C8 C7 C6 C1 C1 C2 C3 C4 C5 C16 C17
C18
TORSION TOR3 6. C19 C9 C10 C11 C11 C12 C13 C14 C15 C20
O1 O2
TORSION TOR4 6. C11 C10 C9 C19 C1 C2 C3 C4 C5 C6 C7
C8 C16 C17 C18 \
C19
TORSION TOR5 141. C13 C14 C15 O1 O1 O2
This appears to be a trivial exercise, but it is not.
Consider all-trans-retinoic acid with 22 atoms and
22 bonds. This yields more than 40 dihedral angles,
but only 5 torsion entries (two of which are wrong
and only appear because the torsion angle of 6(o) falls
outside the tolerance for fixed dihedrals; it probably
should have been restrained more tightly in the refinement).
The trick is to throw away every dihedral which appears
to be strongly restrained, dihedrals inside rings,
etc. MOLEMAN uses the following criteria to reduce
the number of torsions:
- any dihedral which is equal to -180(o), 0(o) or +180(o)
(with a tolerance of 5(o)) is rejected as (probably)
being strongly restrained to its current value (e.g.,
in conjugated systems);
- any dihedral K-I-J-L whose rotation affects atom "K"
or "I" is rejected as (probably) being part
of a ring system;
- if the number of atoms affected by the torsion is
greater than or equal to the total number of atoms
minus 4, the torsion is rejected. In this case (for
example, a torsion involving a carboxylate) it makes
more sense to use the torsion defined the other way
around (i.e., use torsion L-J-I-K which affects only
1 or 2 atoms, instead of K-I-J-L);
- if a torsion is around the same bond as a previous
torsion, and it affects the same atoms as that previous
torsion, it is rejected as being a simple permutation.
This happens, for instance, for carboxylates and for
aliphatic tails sprouting from a ring, in which case
there are two equivalent ways to define the torsion
of the tail relative to the ring).
Note that the torsions found for retinoic acid (except
the two unwanted ones mentioned earlier), are indeed
the only intuitively reasonable ones: the tail relative
to the ring, the ring relative to the tail, and the
carboxylate relative to the tail.
* TYING IT ALL UP
We have also written a simple script which will take
the residue name of any compound occurring in our hetero-compound
collection file, and automatically generates the complete
set of X-PLOR and O dictionaries as well as the "clean"
PDB file which can be imported directly into both of
these programs.
* AVAILABILITY
The hetero-compound collection is freely available to
anyone interested, as is the script that generates
a set of O and X-PLOR dictionaries for any compound
in this collection. The programs XPLO2D and MOLEMAN
are available to academic users free of charge. For
more information, contact GJK (E-mail: "gerard@xray.bmc.uu.se").