Uppsala Software Factory Tutorial - Biomolecular Morphing

Uppsala Software Factory Tutorial - Biomolecular Morphing

This page describes how to visualise structural differences between molecules (e.g., related by NCS, or in different crystal forms, mutants, complexes, etc.), using LSQMAN (version 7.0 or newer) and O (or any other molecular graphics program).

References:

Biomolecular Morphing: G.J. Kleywegt (1998). Unpublished results.
LSQMAN: Kleywegt, G.J. (1996). Use of non-crystallographic symmetry in protein structure refinement. Acta Cryst D52, 842-857.
O: Jones, T.A., Zou, J.Y., Cowan, S.W. and Kjeldgaard, M. (1991). Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Cryst A47, 110-119.

If you use molecular morphs generated with LSQMAN in your work (web-site, publications, etc.), please CITE the LSQMAN reference ! Also, if your morphs are publicly available on the web, please E-mail me the URL and a brief description so I can add them to the gallery.

Biomolecular Morphing Gallery

Here are some example morphs:

RBP (434 kB). This animated GIF shows the transition from the open to the closed (ligand-bound) form of ribose-binding protein. It was created using PDB entries 1URP and 2DRI. Only the CA-trace is shown, and the CA-CA bonds have been colour-coded according to the magnitude of the change in "torsion" around them. The individual images were created in O.
© G J Kleywegt, 1998
CBH I (421 kB). This animated GIF shows the transition of the tyrosine loop in cellobiohydrolase I. It includes the CA-trace and all side chains. It was created using PDB entries 1CEL and 7CEL.
© G J Kleywegt, 1998
L2C (148 kB). This animated GIF shows the "transition" from lysozyme (8LYZ) to CRABP type II (1CBS). Individual frames created with RasMol. Nonsense, but funny ! (Smaller version (57 kB).)
© G J Kleywegt, 1998
HIC-Up (64 kB). Animated logo for the HIC-Up site. It was generated as a Cartesian morph from hetero compound RE9 (in PDB entry 1CBQ) to REA (1CBS). The two hetero compounds have the same number of heavy atoms, with the same names. Created with O.
© G J Kleywegt, 1998
SBNet (34 kB). Animated logo for the SBNet site. It was generated as a Cartesian morph from an ideal seven-residue alpha-helix to a seven-residue beta-strand. Created with O.
© G J Kleywegt, 1998
RBP (101 kB). A different view of the closure of RBP, this time showing the protein as a cartoon. The images were created with MSI's WebLab ViewerLite on a Mac, copied to the clipboard, converted to GIF format with Clip2Gif, and animated with GIFBuilder.
© G J Kleywegt, 1998
HIC-Up (25 kB). A different version of the HIC-Up morph, namely of the soft molecular surface. These images were created with MSI's WebLab ViewerLite etc. like the previous one.
© G J Kleywegt, 1998

1 - Designing your morph

Before you start, you want to spend a few minutes thinking about what you want to visualise.

First of all, what is "morphing" ? It is the name of a graphics technique in which one image is gradually changed into another. This is done by generating intermediate images that are "interpolations" between the start and end image. If the whole series of images is shown in rapid succession, it looks like a fluid movie. You may have seen morphs on the web, e.g. a picture of one American president that slowly changes into a picture of another president. Another example.

"Classical" morphing is typically done on images that consist of pixels. However, if you want to visualise a large-scale conformational change in a protein molecule, morphing from a picture of one conformation to that of the other will rarely produce the desired result. For "biomolecular morphing", the major requirement is that the transition is smooth and feels natural. In other words, we want to see/visualise changes in torsions around chemical bonds or around pseudo-bonds (such as CA-CA bonds in a CA-trace of a protein molecule). To do this requires a little bit more work than the straightforward use of some general pixel-based morphing program.

The program LSQMAN contains an option to do biomolecular morphing, i.e. to do "structural interpolation" between two conformational states of a molecule, and in such a way that the result looks chemically reasonable (which is not the same as "physically realistic" !!!).

What do you need for biomolecular morphing ?

two copies of a molecule in different conformational states (e.g., with and without ligand, open or closed conformation, active site loop in different conformations, etc.), in PDB format (the two molecules may be in the same file)
the program LSQMAN (available free of charge for academics and not-for-profit institutions)
a program to visualise the various models (e.g. O, RASMOL, MOLSCRIPT, ...)
a way to capture images of all individual models (either from within the visualisation program, or using a separate image-capture program such as SGI's snapshot, or GNU's GIMP)
a program to convert the images into GIF files, and put them together into an animated GIF file or into a movie (MPEG, QUICKTIME); e.g. one could use xv and the Macintosh program GIFBuilder, or just GIMP

How do you want to morph ? In LSQMAN, there are essentially three different ways to do the morphing:

a set of "central atoms" in internal coordinate space (e.g., for proteins this would be the CA-trace; for nucleic acids perhaps the P-trace)
(proteins only) the CA-trace plus all side chains in internal coordinate space
any set of atoms in Cartesian space

Of these methods, Cartesian morphing is the simplest (and most similar to traditional morphing): for every atom to be morphed, the start and end coordinates are retrieved, and the intermediate models are generated simply by interpolating the coordinates (i.e., every atom moves in a straight line from its starting position to its final position).

Advantages of morphing in Cartesian coordinate space:

works for any type of atom
can be used to morph translational motions (e.g., "diffusion" of a substrate into an active site)
works even if the molecule consist of multiple fragments (e.g., a protein plus a ligand)

Disadvantages:

the two molecules must have been superimposed previously
the geometry of the molecule will be distorted much of the time (e.g., helices may not look like nice helices during the morphing; some bonds may become broken; artefactual bonds may be formed transiently)
the motion will not look "natural"

Morphing in internal coordinate space is strongly preferred (but not always possible; see below). Here, a CA-trace is represented not by its Cartesian coordinates, but by:

the distance to the previous CA atom
the angle to the CA atom before that
the torsion with respect to the CA atom before that

Then, to morph, LSQMAN compares these internal coordinates in the start and end structure, and interpolates to effect the changes smoothly. For each intermediate model, the internal coordinates are converted back to Cartesian coordinates (this means that the first atom will always end up at the origin (0,0,0) !).

Morphing a protein's CA-trace plus all side chains proceeds similar to central-atom morphing, but care is taken that the internal coordinates of the side-chain atoms involve "natural" torsions (CA-CB; CB-CG, etc.).

Advantages of morphing in internal coordinate space:

gives an impression of very natural motion !
no need to superimpose the molecules (since distances, angles and torsions are independent of orientation)

Disadvantages:

the major limitation is that if a very large change occurs in one (or more) of the torsions along the CA-trace (except at the termini), all protein residues C-terminal of each large change will "ride along" during this change (giving the impression that the protein first unfolds, and then refolds into the final conformation ...). If such changes occur (you will notice it immediately when you visualise all models), you have to resort to Cartesian space morphing (not always, though; there is a fix in LSQMAN which sometimes works; see the manual). As an example, try to morph from molecule "A" to "B" in PDB entry 1SHF, or from the lipase with open to closed lid (PDB entries 1CRL and 1TRH) in internal CA space !
ring-closures are not necessarily maintained (e.g., a six-membered ring might "open up" during part of the morphing)
the morphed models must all be superimposed (can be done quickly with LSQMAN)
cannot be used to morph translations
does not always work well if the molecule consist of multiple fragments (e.g., a protein plus a ligand)

All in all, the perception of natural motion makes internal coordinate morphing superior for most purposes, except when one or more very large changes occur.

If you want, LSQMAN can try to avoid large changes in the central atom torsion angles, but this sometimes in turn leads to broken bonds and distorted molecules. It's a trade-off.

What do you want to morph ? Now you know about the ins and outs of morphing in internal and Cartesian coordinate space, you can decide which parts of your molecule you want to morph:

to visualise changes in the "central atom" conformation (e.g., CA atoms for proteins, P atoms for DNA/RNA), first try internal morphing of these atoms (e.g., to show domain motion, or loop movements)
for proteins, if side chains are important (e.g., ligand binding), you will first want to try internal coordinate morphing of the CA-trace plus the side chains (NOTE: you may only want to include side chains that are close to the ligand - this can easily be done with MOLEMAN2)
if nothing else works (or if you want to visualise translational motion of a rigid body, e.g. a ligand moving into a cavity), use Cartesian space morphing
if you really want to impress your boss, you can combine different morphs (with some file editing), e.g. domain closure upon ligand binding as a CA-trace morph, and the ligand moving into its binding site as a simple Cartesian space morph !
morphs can also be combined in series, e.g. if you have crystal structures of three conformational states of a protein, you can first morph from the first to the second, and then from the second to the third.
if you have nothing better to do, you can try some silly morphs, e.g. from a protein A to another protein B (must have roughly the same number of residues; see the lysozyme to CRABP2 morph in the Gallery), from an all-L to an all-D protein (use MOLEMAN2 to mirror your favourite protein to obtain the all-D conformation), a Cartesian space morph of a protein to a two-fold related copy of itself (the protein will shrink, collapse onto the origin, and then grow again), from an alpha-helix to a beta-strand, etc. etc.

2 - Generating the intermediate models

First, prepare one or two PDB files with the molecules you want to use as the start and end point of the morph. For example, if you want to morph ribose-binding protein from the open to the closed (ribose-bound) conformation, you could use MOLEMAN2 on PDB entry 2DRI to prepare a PDB file which only contains the main-chain atoms plus all intact residues that have at least one atom within 8 A from any ribose atom (NOTE: you only need to do this for one of the two molecules, since only atoms that both have in common will be used in the morphing process !):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 MOLEMAN2 > re /nfs/pdb/full/2dri.pdb
 MOLEMAN2 > sel none
 MOLEMAN2 > sel or residue rip
 MOLEMAN2 > select distance 0.0 8.0
 MOLEMAN2 > sel by_residue
 MOLEMAN2 > sel or class main
 MOLEMAN2 > sel and type prot
 MOLEMAN2 > wr 2dri.pdb pdb selected
 MOLEMAN2 > quit
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Now you are ready to generate the models and to superimpose them with LSQMAN (see the LSQMAN manual for details on the commands and parameters for this program).

Cartesian space morphing. You must read the two molecules, superimpose them, make sure that ambiguous side-chain atoms are as close as possible (e.g., CD1/CE1 versus CD2/CE2 in Phe and Tyr), select the atom types you want to include, do the morphing, and execute the macro that LSQMAN produces to superimpose all morphed models:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
read m1 /nfs/pdb/full/1urp.pdb
read m2 ./2dri.pdb
at ca
! superimpose them on residues a100 to a230
ex m1 a100-230 m2 a1
! improve operator if necessary
set dist 1.0
im m1 a* m2 a*
apply m1 m2
! fix ambiguous atom names
nomen m1
fix m1 a1-999 m2 a1 strict seq rmsd
! select atoms to use (here: CA only)
at ca
set reset
! generate 12 morphos (includes start and end)
morph m1 a1-999 m2 a1 12 cart cart y a100-230 999
! superimpose them all on the first model
at ca
@cart.lsqmac
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Internal coordinate space morphing of a "central atom" trace. You must read the two molecules, select the atom type you want to morph, do the morphing, and execute the macro that LSQMAN produces to superimpose all morphed models:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
read m1 /nfs/pdb/full/1urp.pdb
read m2 ./2dri.pdb
morph m1 a1-999 m2 a1 12 morphy internal x a100-230 999
at ca
@morphy.lsqmac
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Internal coordinate space morphing of a CA-trace plus side chains (proteins only). You must read the two molecules, make sure that ambiguous side-chain torsions have similar values, set the atom type to TRACE, do the morphing, and execute the macro that LSQMAN produces to superimpose all morphed models:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
read m1 /nfs/pdb/full/1urp.pdb
read m2 ./2dri.pdb
! fix side-chain torsions for Phe, Tyr, Asp, Glu, and Arg
nomen m1
fix m1 a1-999 m2 a1 strict seq torsion
! select atom type TRACE
at trace
morph m1 a1-999 m2 a1 12 trace int z a100-230 999
at ca
@trace.lsqmac
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

By the way: did you notice how fast the morphing is ? In a few seconds all the intermediate models have been generated.

3 - Admiring the results

If you are an O user, you are in luck. LSQMAN has produced two O macros for you. Start a new O session, then execute the first macro (which will be called, e.g., morphy_read.omac). When it is done, you should have all models drawn for you. Find a good viewpoint, and then do not execute the second macro. Well, do it anyway and you will see what I mean. When O executes a macro, it will not update the display until the macro is finished - i.e., you see nothing much happening. Now try this: select the commands in the O macro and paste them into your O command window. "Oooooohhhhh ! Cool ! Great ! Wow ! Gerard, please can I send you money ? I want to have your baby !"

What do the colours mean ? By default, the B-factor of each atom holds the distance it travels (Cartesian) or the magnitude of the torsion change around one of its bonds (the largest one). The objects are then colour-ramped from low B-factor (blue; little or no motion) to high (red; loads of motion).

If you want to make an animated GIF file, here is how one can do it (using O on an SGI):

start a new O session
execute the XXX_read.omac macro
do any embellishments you wish to make to all objects (e.g., sketch_stick your CA-trace; change the colours of your models)
start up snapshot (on SGIs) and take pictures of each of the models in turn
convert the RGB files from snapshot into GIF files (e.g., with xv)
build an animated GIF, e.g. with the Macintosh program GIFBuilder
publish the animated GIF on your web site (don't forget to cite LSQMAN; a link to this page would also be nice)
send an E-mail to Gerard with the URL of your image(s), and a brief description, for inclusion in the Gallery (see above)

NOTE: the image capture and conversion steps can also be carried out with the GNU program GIMP.

NOTE: a simpler and faster method is to use RasMol, from which you can save your images in GIF format directly (the lysozyme to CRABP2 morph in the Gallery was generated in this fashion). Colour your molecule by B-factor if you want to show where the major changes occur.

NOTE: MSI's WebLab ViewerLite (free) also works nicely (see the gallery). One problem with both RasMol and WebLab is that you can't define the view to be the same for all images (I'm too lazy to read the manual ;-). So you may have to experiment with rotating your first molecule (e.g., with MOLEMAN2) until you get a reasonable view.

HINT: since LSQMAN produces coordinates for all intermediate steps, you can use any and all programs that can visualise different aspects of molecular structure for your morph ! For example, you could use MolScript and render the images, or Grasp, or display a cavity as it gets filled by an approaching ligand, etc.

Important note: the results of morphing may look so smooth, appealing, and natural that one almost cannot help but "believe" in them. This is dangerous ! Morphing does not claim to generate a physically realistic path from one conformation to another ! However, it can be useful as a visualisation and analysis aid (not to mention teaching and communication of results). Seeing a protein change conformation in "real time" is much more informative than looking at the superimposed images of the before and after states (which just give you a terrible headache ;-). Also, the colour-coding focusses your attention on the "hot spots" of the transition, and may help you better understand the changes that take place.

NOTE: after having done all this, I found the "Database of Molecular Movements" on the web (thanks, AltaVista). They use a different way to morph, namely a Cartesian space morphing followed by energy minimisation in X-PLOR of each intermediate (reprint of the Nucleic Acids Research paper by Gerstein and Krebs).

The classic adenylate kinase movie (Vonrhein et al., 1995) should live here.

Latest update at 17 December, 1998.