March 2024 ESRFnews
14
unannotated volumes, selecting those it has the most to
learn from. The human researchers review its output,
applying corrections to be integrated into the next
training loop. This next loop is then concluded, and
another, and so on. According to Cadiou, training a
model to segment one volume with more classical deep-
learning algorithms can take a week or two due to the
manual image-annotation process, whereas this active
learning procedure gives results of sufficient fidelity in
about a day. “The speed-up is all the more required when
we’re working with in situ or operando data, as there are
then numerous volumes to analyse conjointly,” he says.
Many other ESRF users are turning to machine
learning to assist in segmentation, especially when the
raw images contain the unprecedented levels of detail
provided by the new Extremely Brilliant Source. Backed
by a grant from the European Research Council, ESRF
scientist Alexandra Pacureanu is turning to automated
segmentation to resolve neural circuits in mammalian
brains in data from the ID16A nano-imaging beamline.
Meanwhile, drawing on hierarchical phase-contrast
tomography (HiP-CT) data taken at the ESRF’s
flagship BM18 beamline researchers involved in the
Human Organ Atlas HOA project cofunded by the
Chan Zuckerberg Initiative are relying on automated
segmentation to identify various anatomical structures
inside organs particularly blood vessels but also
airways in the lungs the glomeruli or filtering units
of the kidneys and parts of the brain figure 2
Training training training
HiPCT data expose the challenge and potential for
machine learning in segmentation Given that it can
deliver images of entire organs with a resolution down
to the single cell in regions of interest the data volumes
are massive, often a terabyte or more, requiring hefty
processing power. In addition, features such as blood
vessels are genuinely hierarchical – meaning that seg-
mentation has to be performed over disparate length
scales – and vary greatly from person to person. Perhaps
the trickiest problem is the sheer novelty of the imaging
technique: there are simply no data already available
that a machine-learning algorithm can draw on to train
itself. “This is something that is often glossed over, but
machine learning can only be as good as the data used to
train it,” says HOA scientist Claire Walsh at University
College London in the UK. “And making these data is
a huge undertaking. We have two experts labelling each
dataset, and a third to go over the combined labels and
mark areas that need improvement.”
The HOA has an open data policy. That permits
another avenue for machine learning, for its datasets –
which include entire organs, either healthy or afflicted by
various diseases – can be mined by independent research
groups, using their own algorithms and driven by their
own research goals. Indeed, automated mining of open
data is behind what is arguably the most scientifically
influential deeplearning product of recent years
AlphaFold which is developed by DeepMind a research
laboratory based in London UK and owned by the
parent company of Google Trained on experimental
largely synchrotronderived data in the Protein Data
Bank AlphaFold has succeeded where humans could
not by predicting with incredible accuracy protein
structures from their amino acid sequences AlphaFold
predictions can in turn boost the experimental
determination of new structures figure 3
In other areas of synchrotron science machine
learning is not so much about breaking new ground
but making existing ground more accessible Take Xray
Machine
learning can
only be as
good as the
data used to
train it
MACHINE LEARNING
0
X-r a y r e f l e c t i v i t y [a.u.]
1e
–3
1e
–6
1e
–9
1e
–12
1e
–15
1e
–18
1e
–21
1e
–24
0.1
momentum transfer [1/Å]
499.6 Å
398.0 Å
299.3 Å
199.3 Å
147.5 Å
97.2 Å
75.3 Å
measurement
ML result
0.2
0.3
100
100
200
r e a c h e d t h i c k n e s s [Å]
300
400
500
600
200
300
target thickness [Å]
400 500
600
target = reached (theoretical)
Figure 1 Measured
data compared with
deep-learning
predictions for a
crystal-growth
experiment at ID10.
(a) The algorithm
predicts the
relationship between
X-ray momentum
transfer and
reflectivity
oscillations, which
are a measure of
properties such as
thickness and
surface roughness.
(b) The algorithm
predicts when to stop
the in situ molecular
beam deposition for
a certain desired
film thickness.
(a)(b)