March 2024 ESRFnews
15
MACHINE LEARNING
spectroscopy, for instance. This can be an incredibly
versatile technique, providing insights into the makeup
of all kinds of samples – from historic paintings, to
polluted soil, to new catalysts, and much else besides.
But interpreting individual spectra can be as much an
expert task as classifying fingerprints – and one that not
all users will be up to. Machine learning can be trained to
automatically extract information, such as atomic bond
lengths, coordination numbers, charges and so on. “The
goal is to democratise the analysis of X-ray spectra to non-
expert users,” says ESRF software development engineer
Marius Retegan, who hosted a microsymposium on the
topic at this year’s User Meeting.
This type of automated tool for spectroscopy is still
in its infancy, as experimental data are not yet
consistently stored in standardised formats necessary
for training. Still, spectroscopy users may already have
resorted to machine learning without realising it.
PyMCA – often regarded as the Swiss army knife for
scanning spectroscopy data analysis – has supported
users for more than 15 years, and relies on unsupervised
machine learning
The impact of machine learning will be greatest for
the next generation of users As part of the ENGAGE
programme the ESRF has three PhD students who are
honing skills in computational physics One of these is
Matteo Masto who is now in his second year developing
deeplearning algorithms for coherent diffraction
imaging helping to retrieve lost phase information
as well as those empty pixels that can be artefacts of
even the best Xray detectors More and more people
me included now are trying to employ deeplearning
methods for the phase problem and it seems to show
promising results he says Besides this there is a lot
more coming in the future for many other applications,
such as de-noising, super-resolution and particle-defect
identification and classification.”
The benefits of machine learning may not always be
felt directly. Nicolas Leclercq, the ESRF head of
accelerator control, believes a variant known as reinforce-
ment learning – which learns on the fly from adjusting
parameters, and therefore does not need well-labelled
training data to begin with – could one day improve
the optimisation of the ESRF storage ring. In the
ESRF vacuum group, Emmanuel Burtin and Anthony
Meunier have been using machine learning to identify
sudden pressure rises, which are a proxy for various
events in the storage ring – valves mistakenly opening or
closing, air leaks, electron-beam perturbations, and so on.
Classifying each of these events used to take a few minutes
when it was done manually; now an algorithm can do it in
less than a second. It can even expose new classes of event,
and reclassify swathes of past events accordingly – all in
all helping to make accelerator control more efficient
(ESRF Highlights 2023, p174).
Finally of course there is the freely available generative
AI Chatbots provide a quick if not always reliable
means to research scientific topics for example or to
help compose papers and other documents in foreign
languages More broadly FavreNicolin anticipates
a time when users have recourse to virtual beamline
assistants to plug gaps in their experience encouraging
them to pursue more adventurous lines of enquiry
They might ask how can I do this experiment Can you
advise me on parameters he says Its bound to happen
relatively soon
Figure 2 Painstaking manual segmentation of ESRF tomographic data reveals the
vasculature of a human kidney for the Human Organ Atlas project. (Colours
correspond to four different artery branches.) It also provides valuable training data
for deep-learning algorithms that will be able to do the same job much faster (bioRxiv
doi: 10.1101/2023.03.28.534566).
20 cm 20 cm
Figure 3 In 2022, combined with experimental data from the
ESRF and SciLifeLab in Stockholm, Sweden, the deep-learning
tool AlphaFold enabled researchers to determine the structures
of two human proteins, GP2 and UMOD (pictured). The proteins
are known to counteract the bacteria behind gastrointestinal
and urinary tract infections (Nat. Struct. Mol. Biol. 29 190).
“More
and more
people are
trying to
employ
deep-
learning
methods
for the
phase
problem”
Jon Cartwright