Uppsala Software Factory Tutorial - Biomolecular Morphing
This page describes how to visualise structural differences between
molecules (e.g., related by NCS, or in different crystal forms, mutants,
complexes, etc.), using LSQMAN (version 7.0 or newer) and O (or any other
molecular graphics program).
References:
- Biomolecular Morphing: G.J. Kleywegt (1998). Unpublished results.
- LSQMAN: Kleywegt, G.J. (1996). Use of non-crystallographic
symmetry in protein structure refinement. Acta Cryst D52, 842-857.
- O: Jones, T.A., Zou, J.Y., Cowan, S.W. and Kjeldgaard, M. (1991). Improved
methods for building protein models in electron density maps and the location
of errors in these models. Acta Cryst A47, 110-119.
If you use molecular morphs generated with LSQMAN in your work
(web-site, publications, etc.), please CITE the LSQMAN reference !
Also, if your morphs are publicly available on the web, please
E-mail
me the URL and a brief description so I can add them to the gallery.
Biomolecular Morphing Gallery
Here are some example morphs:
- RBP (434 kB). This animated
GIF shows the transition from the open to the closed (ligand-bound)
form of ribose-binding protein. It was created using PDB entries 1URP
and 2DRI. Only the CA-trace is shown, and the CA-CA bonds have been
colour-coded according to the magnitude of the change in "torsion"
around them. The individual images were created in O.
© G J Kleywegt, 1998
- CBH I (421 kB). This animated
GIF shows the transition of the tyrosine loop in cellobiohydrolase I.
It includes the CA-trace and all side chains. It was created using PDB
entries 1CEL and 7CEL.
© G J Kleywegt, 1998
- L2C (148 kB). This animated
GIF shows the "transition" from lysozyme (8LYZ) to CRABP type II (1CBS).
Individual frames created with RasMol. Nonsense, but funny !
(Smaller version (57 kB).)
© G J Kleywegt, 1998
- HIC-Up (64 kB). Animated logo
for the HIC-Up site.
It was generated as a Cartesian morph from hetero compound RE9
(in PDB entry 1CBQ) to REA (1CBS). The two hetero compounds have
the same number of heavy atoms, with the same names. Created with O.
© G J Kleywegt, 1998
- SBNet (34 kB). Animated logo
for the SBNet site.
It was generated as a Cartesian morph from an ideal seven-residue
alpha-helix to a seven-residue beta-strand. Created with O.
© G J Kleywegt, 1998
- RBP (101 kB). A different
view of the closure of RBP, this time showing the protein as
a cartoon. The images were created with MSI's WebLab ViewerLite
on a Mac, copied to the clipboard, converted to GIF format with
Clip2Gif, and animated with GIFBuilder.
© G J Kleywegt, 1998
- HIC-Up (25 kB). A different
version of the HIC-Up morph, namely of the soft molecular surface.
These images were created with MSI's WebLab ViewerLite etc. like
the previous one.
© G J Kleywegt, 1998
1 - Designing your morph
Before you start, you want to spend a few minutes thinking about what
you want to visualise.
First of all, what is "morphing" ? It is the name of a graphics technique
in which one image is gradually changed into another. This is done by
generating intermediate images that are "interpolations" between the start
and end image. If the whole series of images is shown in rapid succession,
it looks like a fluid movie. You may have seen morphs on the web, e.g. a picture
of one American president that slowly changes into a picture of another
president.
Another example.
"Classical" morphing is typically done on images that consist of pixels.
However, if you want to visualise a large-scale conformational change
in a protein molecule, morphing from a picture of one conformation to
that of the other will rarely produce the desired result. For "biomolecular
morphing", the major requirement is that the transition is smooth
and feels natural. In other words, we want to see/visualise changes
in torsions around chemical bonds or around pseudo-bonds (such as CA-CA
bonds in a CA-trace of a protein molecule). To do this requires a little
bit more work than the straightforward use of some general pixel-based
morphing program.
The program LSQMAN contains an option to do biomolecular morphing, i.e.
to do "structural interpolation" between two conformational states of
a molecule, and in such a way that the result looks chemically reasonable
(which is not the same as "physically realistic" !!!).
What do you need for biomolecular morphing ?
- two copies of a molecule in different conformational states
(e.g., with and without ligand, open or closed conformation,
active site loop in different conformations, etc.), in PDB
format (the two molecules may be in the same file)
- the program LSQMAN (available
free of charge for academics and not-for-profit institutions)
- a program to visualise the various models (e.g. O, RASMOL, MOLSCRIPT, ...)
- a way to capture images of all individual models (either from within the
visualisation program, or using a separate image-capture program
such as SGI's snapshot, or GNU's GIMP)
- a program to convert the images into GIF files, and put them together
into an animated GIF file or into a movie (MPEG, QUICKTIME); e.g.
one could use xv and the Macintosh program GIFBuilder, or just GIMP
How do you want to morph ? In LSQMAN, there are essentially three
different ways to do the morphing:
- a set of "central atoms" in internal coordinate space (e.g., for
proteins this would be the CA-trace; for nucleic acids perhaps
the P-trace)
- (proteins only) the CA-trace plus all side chains in internal coordinate
space
- any set of atoms in Cartesian space
Of these methods, Cartesian morphing is the simplest (and most similar to
traditional morphing): for every atom to be morphed, the start and end
coordinates are retrieved, and the intermediate models are generated
simply by interpolating the coordinates (i.e., every atom moves in a
straight line from its starting position to its final position).
Advantages of morphing in Cartesian coordinate space:
- works for any type of atom
- can be used to morph translational motions (e.g., "diffusion" of a substrate
into an active site)
- works even if the molecule consist of multiple fragments (e.g., a protein
plus a ligand)
Disadvantages:
- the two molecules must have been superimposed previously
- the geometry of the molecule will be distorted much of the time (e.g.,
helices may not look like nice helices during the morphing; some bonds
may become broken; artefactual bonds may be formed transiently)
- the motion will not look "natural"
Morphing in internal coordinate space is strongly preferred (but
not always possible; see below). Here, a CA-trace is represented not
by its Cartesian coordinates, but by:
- the distance to the previous CA atom
- the angle to the CA atom before that
- the torsion with respect to the CA atom before that
Then, to morph, LSQMAN compares these internal coordinates in the start and
end structure, and interpolates to effect the changes smoothly. For each
intermediate model, the internal coordinates are converted back to Cartesian
coordinates (this means that the first atom will always end up at the origin
(0,0,0) !).
Morphing a protein's CA-trace plus all side chains proceeds similar to
central-atom morphing, but care is taken that the internal coordinates
of the side-chain atoms involve "natural" torsions (CA-CB; CB-CG, etc.).
Advantages of morphing in internal coordinate space:
- gives an impression of very natural motion !
- no need to superimpose the molecules (since distances, angles and
torsions are independent of orientation)
Disadvantages:
- the major limitation is that if a very large change occurs in one (or more)
of the torsions along the CA-trace (except at the termini), all protein
residues C-terminal of each large change will "ride along" during this
change (giving the impression that the protein first unfolds, and then
refolds into the final conformation ...). If such changes occur (you
will notice it immediately when you visualise all models), you have to
resort to Cartesian space morphing (not always, though; there is a fix
in LSQMAN which sometimes works; see the manual). As an example, try
to morph from molecule "A" to "B" in PDB entry 1SHF, or from the
lipase with open to closed lid (PDB entries 1CRL and 1TRH) in internal
CA space !
- ring-closures are not necessarily maintained (e.g., a six-membered ring
might "open up" during part of the morphing)
- the morphed models must all be superimposed (can be done quickly with
LSQMAN)
- cannot be used to morph translations
- does not always work well if the molecule consist of multiple fragments
(e.g., a protein plus a ligand)
All in all, the perception of natural motion makes internal coordinate
morphing superior for most purposes, except when one or more very large
changes occur.
If you want, LSQMAN can try to avoid large changes in the central atom
torsion angles, but this sometimes in turn leads to broken bonds and
distorted molecules. It's a trade-off.
What do you want to morph ? Now you know about the ins and outs
of morphing in internal and Cartesian coordinate space, you can decide
which parts of your molecule you want to morph:
- to visualise changes in the "central atom" conformation (e.g., CA atoms
for proteins, P atoms for DNA/RNA), first try internal morphing of these
atoms (e.g., to show domain motion, or loop movements)
- for proteins, if side chains are important (e.g., ligand binding), you
will first want to try internal coordinate morphing of the CA-trace plus
the side chains (NOTE: you may only want to include side chains that
are close to the ligand - this can easily be done with
MOLEMAN2)
- if nothing else works (or if you want to visualise translational motion of
a rigid body, e.g. a ligand moving into a cavity), use Cartesian space
morphing
- if you really want to impress your boss, you can combine different morphs
(with some file editing), e.g. domain closure upon ligand binding as a
CA-trace morph, and the ligand moving into its binding site as a simple
Cartesian space morph !
- morphs can also be combined in series, e.g. if you have crystal structures
of three conformational states of a protein, you can first morph from
the first to the second, and then from the second to the third.
- if you have nothing better to do, you can try some silly morphs, e.g.
from a protein A to another protein B (must have roughly the same number
of residues; see the lysozyme to CRABP2 morph in the Gallery), from
an all-L to an all-D protein (use MOLEMAN2 to mirror your favourite
protein to obtain the all-D conformation), a Cartesian space morph
of a protein to a two-fold related copy of itself (the protein will
shrink, collapse onto the origin, and then grow again), from an
alpha-helix to a beta-strand, etc. etc.
2 - Generating the intermediate models
First, prepare one or two PDB files with the molecules you want to use as
the start and end point of the morph. For example, if you want to morph
ribose-binding protein from the open to the closed (ribose-bound) conformation,
you could use MOLEMAN2 on PDB entry 2DRI
to prepare a PDB file which only contains the main-chain atoms plus all intact
residues that have at least one atom within 8 A from any ribose atom
(NOTE: you only need to do this for one of the two molecules, since only
atoms that both have in common will be used in the morphing process !):
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
MOLEMAN2 > re /nfs/pdb/full/2dri.pdb
MOLEMAN2 > sel none
MOLEMAN2 > sel or residue rip
MOLEMAN2 > select distance 0.0 8.0
MOLEMAN2 > sel by_residue
MOLEMAN2 > sel or class main
MOLEMAN2 > sel and type prot
MOLEMAN2 > wr 2dri.pdb pdb selected
MOLEMAN2 > quit
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Now you are ready to generate the models and to superimpose them with LSQMAN
(see the LSQMAN manual for details on the
commands and parameters for this program).
- Cartesian space morphing. You must read the two molecules, superimpose
them, make sure that ambiguous side-chain atoms are as close as possible
(e.g., CD1/CE1 versus CD2/CE2 in Phe and Tyr), select the atom types you
want to include, do the morphing, and execute the macro that LSQMAN produces
to superimpose all morphed models:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
read m1 /nfs/pdb/full/1urp.pdb
read m2 ./2dri.pdb
at ca
! superimpose them on residues a100 to a230
ex m1 a100-230 m2 a1
! improve operator if necessary
set dist 1.0
im m1 a* m2 a*
apply m1 m2
! fix ambiguous atom names
nomen m1
fix m1 a1-999 m2 a1 strict seq rmsd
! select atoms to use (here: CA only)
at ca
set reset
! generate 12 morphos (includes start and end)
morph m1 a1-999 m2 a1 12 cart cart y a100-230 999
! superimpose them all on the first model
at ca
@cart.lsqmac
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
- Internal coordinate space morphing of a "central atom" trace. You must
read the two molecules, select the atom type you want to morph, do the morphing,
and execute the macro that LSQMAN produces to superimpose all morphed models:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
read m1 /nfs/pdb/full/1urp.pdb
read m2 ./2dri.pdb
morph m1 a1-999 m2 a1 12 morphy internal x a100-230 999
at ca
@morphy.lsqmac
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
- Internal coordinate space morphing of a CA-trace plus side chains (proteins
only). You must read the two molecules, make sure that ambiguous side-chain
torsions have similar values, set the atom type to TRACE, do the morphing, and
execute the macro that LSQMAN produces to superimpose all morphed models:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
read m1 /nfs/pdb/full/1urp.pdb
read m2 ./2dri.pdb
! fix side-chain torsions for Phe, Tyr, Asp, Glu, and Arg
nomen m1
fix m1 a1-999 m2 a1 strict seq torsion
! select atom type TRACE
at trace
morph m1 a1-999 m2 a1 12 trace int z a100-230 999
at ca
@trace.lsqmac
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
By the way: did you notice how fast the morphing is ? In a few seconds
all the intermediate models have been generated.
3 - Admiring the results
If you are an O user, you are in luck. LSQMAN has produced two O macros
for you. Start a new O session, then execute the first macro (which
will be called, e.g., morphy_read.omac). When it is done, you should have
all models drawn for you. Find a good viewpoint, and then do not
execute the second macro. Well, do it anyway and you will see what I mean.
When O executes a macro, it will not update the display until the macro
is finished - i.e., you see nothing much happening. Now try this: select
the commands in the O macro and paste them into your O command window.
"Oooooohhhhh ! Cool ! Great ! Wow ! Gerard, please can I send you
money ? I want to have your baby !"
What do the colours mean ? By default, the B-factor of each atom holds
the distance it travels (Cartesian) or the magnitude of the torsion
change around one of its bonds (the largest one). The objects are then
colour-ramped from low B-factor (blue; little or no motion) to high
(red; loads of motion).
If you want to make an animated GIF file, here is how one can do it
(using O on an SGI):
- start a new O session
- execute the XXX_read.omac macro
- do any embellishments you wish to make to all objects (e.g.,
sketch_stick your CA-trace; change the colours of your models)
- start up snapshot (on SGIs) and take pictures of each of the
models in turn
- convert the RGB files from snapshot into GIF files (e.g., with xv)
- build an animated GIF, e.g. with the Macintosh program GIFBuilder
- publish the animated GIF on your web site (don't forget to cite LSQMAN;
a link to this page would also be nice)
- send an E-mail
to Gerard with the URL of your image(s), and a brief description, for inclusion
in the Gallery (see above)
NOTE: the image capture and conversion steps can also be carried out
with the GNU program GIMP.
NOTE: a simpler and faster method is to use RasMol, from which you
can save your images in GIF format directly (the lysozyme to CRABP2
morph in the Gallery was generated in this fashion). Colour your
molecule by B-factor if you want to show where the major changes occur.
NOTE: MSI's WebLab ViewerLite (free) also works nicely (see the gallery).
One problem with both RasMol and WebLab is that you can't define the
view to be the same for all images (I'm too lazy to read the manual ;-).
So you may have to experiment with rotating your first molecule (e.g.,
with MOLEMAN2) until you get a reasonable
view.
HINT: since LSQMAN produces coordinates for all intermediate steps,
you can use any and all programs that can visualise different aspects
of molecular structure for your morph ! For example, you could use
MolScript and render the images, or Grasp, or display a cavity as it
gets filled by an approaching ligand, etc.
Important note: the results of morphing may look so smooth, appealing,
and natural that one almost cannot help but "believe" in them. This is dangerous !
Morphing does not claim to generate a physically realistic path from one
conformation to another ! However, it can be useful as a visualisation and
analysis aid (not to mention teaching and communication of results). Seeing a
protein change conformation in "real time" is much more informative than looking
at the superimposed images of the before and after states (which just give
you a terrible headache ;-). Also, the colour-coding
focusses your attention on the "hot spots" of the transition, and may help you
better understand the changes that take place.
NOTE: after having done all this, I found the
"Database of Molecular
Movements" on the web (thanks, AltaVista). They use a different
way to morph, namely a Cartesian space morphing followed by
energy minimisation in X-PLOR of each intermediate
(reprint
of the Nucleic Acids Research paper by Gerstein and Krebs).
The classic adenylate kinase movie (Vonrhein et al., 1995) should
live here.
Latest update at 17 December, 1998.