E N A B L I N G T E C H N O L O G I E S
1 7 0 H I G H L I G H T S 2 0 2 3 I
EWOKS brings automated and reproducible data processing to ESRF beamlines
The ESRF has developed a major new tool to automate experiment and data processing, aiming to increase experimental throughput. The ESRF Workflow System (EWOKS) supports fully automated workflows on the beamlines but also brings the power of these workflows to users for local use. EWOKS therefore aims to make the processing of data more efficient, accessible and reproducible.
The ESRF S Extremely Brilliant Source (ESRF-EBS), launched in August 2020, has resulted in an increase in experiments demanding higher resolution and faster data collection. Therefore, the automation of beamlines and data processing is becoming increasingly crucial in order to speed up experiment throughput, data collection and processing (e.g., on-the-fly feedback on the quality of the collected data, data conversion or reduction, etc.). Automation is not just a matter of convenience; it is essential in order to cope with the high throughput of EBS and to help scientists to make informed decisions during the experiment, therefore increasing the chances of collecting scientifically relevant results.
Workflows are data-processing pipelines made of a succession of steps executing specific tasks. Figure 134 shows an example of a workflow. These tasks are written by data-analysis experts, drawing on their knowledge and experience, but can be reused and rearranged by non- expert users to create data-processing pipelines tailored to the experiment and/or beamline. Therefore, workflows combine robustness and flexibility, making them ideal candidates to run automated data processing.
To support workflow execution and the data-processing automation that comes with it, the ESRF Workflow System, or EWOKS, was developed by the Data Automation Unit (DAU) of the Software group. The added value of EWOKS lies in the fact that it does not commit to a single execution workflow system but is instead able to integrate any workflow framework. This means existing workflows, such as those used in structural biology or tomography, are easily integrated into EWOKS. Moreover, this makes EWOKS much more resilient and flexible since it is, by design, isolated from the underlying software technologies used to execute and manage workflows. EWOKS can be installed locally by users and can also be deployed as a centralised service on beamlines. This allows EWOKS to be used either for local processing or for automated processing on the beamline. For further ease of use, EWOKS also comes with a web editor to create and edit workflows (Figure 135).
To better integrate with existing scientific tools, EWOKS is written in Python, a language that is now a standard for scientific computing. EWOKS workflows can therefore use any Python code as tasks, including code calling external programs (PyTorch, Fortran, etc.) and Jupyter notebooks, a popular tool for scientific data processing. This makes EWOKS a growing ecosystem from where scientists can pick from more than 400 tasks to build data-processing pipelines [1]. This ecosystem is catered and extended
Fig. 134: Example of a tomography workflow.