June 2022 ESRFnews 17
ARTIFICIAL INTELLIGENCE
Jon Cartwright
It is not just structural biology at the ESRF that is reaping the benefits of AI. In the past two years, for example:
ARTIFICIAL INTELLIGENCE AT THE ESRF
ESRF scientist Alexandra Pacureanu and colleagues at Harvard Medical School in Boston, US, and elsewhere trained a neural network to assist reconstructions of the nervous system of the fruit fly based on data from X-ray holographic nanotomography (Nat. Neurosci. 23 1637). A team of researchers from the University of Malta and the ESRF employed machine learning to automatically segment the volumetric images recorded during propagation phase contrast synchrotron microtomography of Egyptian
mummies (PLoS ONE 16 e0260707). A team from the Southern Federal University in Sladkova, Russia, KU Leuven in Belgium and the ESRF found that machine learning could predict ligands and the distances between them in incomplete data of a ruthenium-based catalyst taken with X-ray absorption spectroscopy (J. Phys. Chem. C 125 27844). A team led by ESRF scientists used a neural network to predict dislocations in a crystal from its 3D coherent diffraction pattern (Npj Comput. Mater. 7 115).
the ramifications for their field, Jovine saw immediate potential. For some time he and his colleagues had been studying two human proteins known to counteract the bacteria behind gastrointestinal infection and urinary tract infection, the most common bacterial infection in women: glycoprotein 2 (GP2), which is produced in the pancreas and the intestine, and uromodulin (UMOD), which is the most abundant protein in our urine. The researchers knew that UMOD and GP2 contain bacteria-catching sugar chains that remain unmodified as the proteins are secreted from cells, but they did not know how. They managed to record cryo-EM data from native human UMOD at the SciLifeLab cryo-EM facil- ity in Stockholm, as well as diffraction data from crystals of GP2 at the ESRF MX beamlines ID23-1, ID30B and ID30A-3. However, they could neither fully interpret the UMOD cryo-EM map nor phase the GP2 MX data, due to the lack of any existing similar protein model. That s when we contacted DeepMind, says Jovine.
A helping hand The task was not straightforward. AlphaFold was trained on amino acids, so it could not predict the structures of sugars. Nonetheless, Jovine and colleagues including those at Nanyang Technological University in Singapore, SciLifeLab in Stockholm, Sweden, and Lille University in France could use protein models generated by AlphaFold to interpret the UMOD cryo-EM density map and solve the structure of GP2 by molecular replace- ment (using software known as Phaser). By combining AI and experimental data in that way, the researchers were able to get the best of both worlds and obtain the missing sugar information, says Jovine. The resultant structures of UMOD and GP2, published in March this year, each revealed a protein crevice in which the bases of the crit- ical sugar chains were anchored, preventing them from being modified during protein secretion (Nat. Struct. Mol. Biol. 29 190). According to Jovine, in the future it may be possible to incorporate this knowledge in the develop- ment of drugs for the treatment of bacterial infection.
Like many other structural biologists, Jovine believes it is incorrect to state that AlphaFold has solved the protein-folding problem. The AI system does not predict a protein s structure by modelling the physical interactions underlying the folding of amino acids inside the human body; instead, it learns how amino acids are likely to fold based on how they do in thousands of other protein structures recorded in public repositories, such as the Protein Data Bank structures that were determined experimentally (see Unfolding AlphaFold , left). Never- theless, Jovine admits that in this case, in practical terms, the end result was highly similar .
De Sanctis points out that experimental data will always be needed to verify and complete AI predictions. What AI does enable us to do is push our horizons, he says. In the future, experimental data will be able to go further. With the help of AI, we ll be able to study more complex systems and increase throughput, because part of that initial bottleneck is removed. ESRF scientist Gordon Leonard puts it more simply. Structural biology is not dead, he says. In fact, AlphaFold means more structural biology! Certainly the ESRF is gearing up for more. Max Nanao,
scientist in charge of the ESRF micro-focus MX beamline ID23-2, is working with colleagues to make AlphaFold a routine option for molecular replacement. If users want the service, which is free of charge, they merely have to supply an ID for the amino-acid sequence of their protein. If a hit is found for that ID among the near-million Alpha- Fold structure predictions in the databases of the European Bioinformatics Institute, run by the European Molecular Biology Laboratory near Cambridge, UK, the model is downloaded. It can then be used as it is, or broken up into domains that can be searched again in public databases of experimental data. That is not all. AlphaFold predictions are likely to
become a feature of the image-processing workflow in CM01, refining electron-density maps and filling in missing data within protein complexes. In Septem- ber, Annalisa Pastore, ESRF director of life sciences research, is organising a workshop to explore the theme of AlphaFold integration generally (see p26). Mean- while, forms of AI have recently made inroads into ESRF science in various areas outside structural biology (see Artificial intelligence at the ESRF above), including via the EU-backed doctoral training programme for machine learning, ENGAGE, of which the ESRF is a partner. No-one yet knows for sure what the eventual impact of all this machine learning will be. But one thing is clear: if the intelligence is artificial, the progress is not.
Machine learning is driving the next leap in structural biology