15June 2022 ESRFnews
ARTIFICIAL INTELLIGENCE
S H U T TE
R S TO
C K /G O R O D E N K O FF
of an artificial-intelligence system known as AlphaFold. It allowed us to solve the structure in a few minutes, he says. Artificial intelligence (AI), and the deep learning
of AlphaFold in particular, is poised to transform structural biology. In the last couple of years, it has proved hugely successful at predicting protein structures from their amino-acid sequences alone. Problems that were once insurmountable or, at least, surmountable only with tremendous human effort now suddenly appear tractable. We re amazed by what AlphaFold is capable of doing, says Daniele de Sanctis, an ESRF scientist who also took part in the project. Machine learning is driving the next leap in structural biology, adding another powerful string to its bow. The ESRF has been quick to embrace the AI revolution.
Jovine s ESRF study has been one of the first academic collaborations with the team behind AlphaFold. Mean- while, staff scientists at ESRF beamlines, as well as at the
ESRF cryogenic electron microscopy (cryo-EM) facility, are adopting AI to turbo-boost their routine workflows. Yet AI s coming of age raises as many questions as it answers. Have the fundamental problems at the heart of structural biology finally gone away? What will be the role for experimentation in a future that is increasingly dominated by accurate computer predictions? Will structural biologists still need synchrotrons? Scientists need protein structures because structure
determines a protein s function. Protein structures there- fore tell scientists how life works; by extension, they also reveal what molecules or drugs are likely to bind with particular proteins in such a way as to combat disease, and make people healthier. Historically, most structures have been determined using macromolecular crystal- lography (MX), primarily at synchrotron light sources. The technique involves recording the X-rays that diffract on passage through a crystallised protein, due to interactions with electrons surrounding the protein s individual atoms. Because atoms of different elements have different num- bers of electrons, they diffract X-rays by varying amounts. To determine the locations of those atoms in a protein, then, a researcher s first step is to analyse the intensity and distribution of spots in an X-ray diffraction pattern.
The problem with MX has always been that it misses an important property of the diffracted X-rays: their phase, which cannot be retrieved on an X-ray detector. To resolve this phase problem , scientists have two main options. One, known as anomalous diffraction, involves substituting heavy atoms for the existing atoms at key locations in the protein s structure, so that the diffraction contributions from those locations become more appar- ent and can be used to calculate preliminary phases. The other way of taking the guesswork out of phase, known as molecular replacement, involves comparing the experimental data with that from a similar model or template protein structure in the Protein Data Bank assuming, that is, such a protein exists. This is how 90% of structures are solved nowadays, says Jovine. In an ideal world, none of this would be necessary.
Instead it would be possible to calculate, from basic principles, how the various amino-acid chains trans- lated from an mRNA sequence spontaneously fold into a certain protein structure. But while the amino-acid sequences are almost always known, the complex physics of their folding is not. The protein folding problem is considered one of the most fundamental in biology. For more than three decades, scientists have enlisted
computers to help predict protein structures. Although accuracy improved slowly to begin with, it never made much headway, according to the Critical Assessment of
By combining AI and experimental data, the researchers were able to get the best of both worlds