- Home
- Users & Science
- Scientific Documentation
- ESRF Highlights
- ESRF Highlights 2012
- Structural biology
- The application of hierarchical cluster analysis to the selection of isomorphous crystals
The application of hierarchical cluster analysis to the selection of isomorphous crystals
The amount of diffraction data that can be obtained from a single crystal of a biological macromolecule is limited by radiation damage, due to a resolution-dependent reduction in diffraction intensity. This limitation cannot be overcome. We can optimise the conditions of measurements [1] but there is an absolute limit of resolution and data quality that can be obtained from a single crystal. Furthermore, less data can be obtained from a small crystal before significant radiation damage occurs because the diffracting volume is lower. In cases where several crystals of the same type are available, the result of a crystallography study can be substantially improved by using a multicrystal data collection strategy. However, this effect is only valid if the subdata sets used are from crystals that are structurally identical. Unfortunately, it has been observed that frozen single crystals of the same biological macromolecule often have relatively low structure identity (and are therefore not isomorphous), thus, to obtain a real gain from multicrystal data sets, we need to be able to make an appropriate selection of isomorphous crystals.
In this study, we evaluated the possibility of using hierarchical cluster analysis as a tool for the identification of isomorphous data sets. About hundred multicrystal data sets were collected with the goal of solving the structures of four test proteins using the weak anomalous signal from sulphur atoms. The results of the hierarchical cluster analysis (based on the intensity correlation coefficients for complete data sets) are shown in Figure 26. The data sets constituting the principal clusters were merged together and analysed. The results clearly show improvement in the quality of multicrystal anomalous difference data sets created on the basis of cluster analysis as compared to blindly merging all data sets. As an illustration of the impact, Figure 27 represents electron-density maps for thaumatin.
Our study demonstrates the importance of hierarchical cluster analysis in understanding non-isomorphism of single crystals of biological macromolecules, in particular hierarchical cluster analysis can help in the selection of sets of isomorphous crystals. The results of this study illustrate that for SAD data collection where the anomalous signal is small (~1%) it is of great importance to use a protocol that helps to appropriately merge data sets from different crystals. In difficult cases, this method could permit structure solution when individual data sets are insufficient.
This research shows that the combination of multi-crystal data collection techniques with advanced statistical data analysis methods has a clear potential to expand the applicability of sulphur-SAD phasing techniques towards more complex structures. The methodology could be further extended to aid other applications such as heavy-atom derivative phasing or, possibly, resolution enhancement for poorly ordered systems. The development of automated and reproducible sample handling techniques providing better control over the sample state throughout the experiment is likely to become another important component of multicrystal data collection methods.
Principal publication and authors
R. Giordano (a,b), R.M.F. Leal (a,c), G.P. Bourenkov (d), S. McSweeney (a) and A.N. Popov (a), Acta Cryst. D68, 649-658 (2012).
(a) ESRF
(b) Present address: SLS, PSI, Villigen (Switzerland)
(c) Present address: ILL, Grenoble (France)
(d) EMBL Hamburg Outstation, c/o DESY, Hamburg (Germany)
References
[1] G.P. Bourenkov and A.N. Popov, Acta Cryst. D66, 409-419. (2010).