Skip to main content

Automatic Data Processing

Autoprocessing and phasing pipelines overview

We propose several pipelines for automatic data processing and phasing on the Structural Biology Beamlines :

Automatic data processing pipelines

Grenoble Automatic Data procEssing (GrenADES)
- GrenADES fastproc
  - A fast auto processing run (based on XDS) starts when data collection is initiated
  - Results can be found in PROCESSED_DATA/your_image_directory_name/autoprocessing_imagePrefix_run#_#/grenades_fastproc/results/
- GrenADES parallelproc
  - A full autoprocessing run starts once the data collection has stopped. It is performed in all reasonable bravais lattices. POINTLESS is also used in parallel to determine the space group. An initial integration run is perfomed, and the refined parameters from this run are used for a second integration run. Resolution ranges are adjusted in order to obtain an I/sigma of 2 in the outer resolution shell. Data are then converted to F's and converted to CCP4 MTZ format as well as merged and unmerged Scalepack formats.
  - Results can be found in PROCESSED_DATA/your_image_directory_name/autoprocessing_imagePrefix_run#_#/grenades_parallelproc/SPG_UnitCell/results
EDNA framework Fast Processing System (EDNA Autoprocessing)
- A fast processing run starts once the data collection has stopped. It uses a sequence of XDS, XSCALE, Pointless, Aimless, Truncate and Uniquify built within the EDNA framework.
- Results can be found in PROCESSED_DATA/your_image_directory_name/autoprocessing_imagePrefix_run#_#/EDNA_proc/results/
autoPROC
- This pipeline based on XDS runs once the data collection has stopped for all accademic users and for industrials which belong to the Global Phasing consortium.
- Results can be found in PROCESSED_DATA/your_image_directory_name/autoprocessing_imagePrefix_run#_#/autoPROC/results/
XDSAPP
- Results can be found in PROCESSED_DATA/your_image_directory_name/autoprocessing_imagePrefix_run#_#/XDSAPP/results/
XIA2_DIALS
- Results can be found in PROCESSED_DATA/your_image_directory_name/autoprocessing_imagePrefix_run#_#/XIA2_DIALS/results/

Automatic phasing pipelines

SAD phasing
- All datasets are checked for anomalous signal, and if it is present, SAD phasing (using SHELX) is attempted in multiple space groups after the GrenADES auto processing. Because nothing is known about the sample, many solvent contents are used.
- Results can be found in PROCESSED_DATA/your_image_directory_name/autoprocessing_imagePrefix_run#_#/grenades_fastproc/sad/P212121/0.48/ or PROCESSED_DATA/your_image_directory_name/autoprocessing_imagePrefix_run#_#/grenades_parallelproc/sad/P212121/0.48/
  Within this directory (P212121/0.48/ ) given as example, where SAD phasing was atemmpted in the space group P212121 and 48% solvent content was used, you will find files like:
  - Sad_normal.pdb: Automatically built structure
  - Sad_normal.phs: Phases from SHELXE – can be read in COOT but you need to read in (a) first
  - Sad_normal_i.pdb: Automatically built structure, inverse hand
  - Sad_normal.phs: Inverse hand Phases from SHELXE – can be read in COOT but you need to read in (c) first
  - Model.png: PYMOL image of autobuilt structure
MR phasing from CELL
- The refined unit cell dimensions from GrenADES fastproc(gfp) and GrenADES parallelproc(gpp) are used to query the Oxford Nearest Cell server. A list of three PDBs (1st protein chain is selected as search model) is downloaded from the Nearest-Cell results. If there are multiple families, PDBs are selected from the top three. Rigid body and positional refinements, followed by water picking (using PHASER and Phenix.refine) is performed on any successful (TFZ > 8) runs.
- Results can be found in PROCESSED_DATA/your_image_directory_name/autoprocessing_imagePrefix_run#_#/grenades_fastproc/1T1D.pdb_mrpipe_dir or PROCESSED_DATA/your_image_directory_name/autoprocessing_imagePrefix_run#_#/grenades_parallelproc/1T1D.pdb_mrpipe_dir where 1T1D would be a pdb selected as Nearest-Cell result.
MR from UNIPROT ID (contact nanao at esrf.fr for more information)
1. Go to Proteins and Crystals in EXI
2. Select "My Proteins"
3. Go to Crystal forms (at right)
4. Find the correct crystal form, then Add Structure
5. In the popup, select UNIPROT, then fill in the UNIPROT ID and the number of copies in the ASU ("multiplicity")
6. When you specify samples *with crystal forms* in your shipment:
  1. The PDB database is checked for theUNIPROT ID. The top hits (if any) are downloaded and used for MR
  2. The EBI Alphafold database is checked for theUNIPROT ID. If there is an entry, it will be downloaded. Any low confidence (<70) regions will be removed. No checks for contiguity or anything fancy is done. This model is used for MR
MR from input PDB (from ISPyB)
- Define PDB search model(s) as follows : Defining Molecular Replacement search models and ligands
- Rigid body and positional refinements, followed by water picking (using PHASER and Phenix.refine) is performed on any successful (TFZ > 8) runs. The input model is provided by the user in ISPYB and contrary to the "MR phasing from cell", where the 1st chain of the starting model is used, in that process, the whole input pdb is used.
- Results can be found in PROCESSED_DATA/your_image_directory_name/autoprocessing_imagePrefix_run#_#/grenades_parallelproc/user_pdb.pdb_mrpipe_dir where user_pdb would be a pdb provided in ISPYB for that protein.
MR from input SMILES (from ISPyB) & ligand fitting
- If in addition to a pdb, ligands are specified in ISPYB (while describing the samples in your shipment), molecular replacement and ligand fitting (using PHASER and Phenix.refine) will be done automatically (test version)
- Results can be found in PROCESSED_DATA/your_image_directory_name/autoprocessing_imagePrefix_run#_#/grenades_parallelproc/user_pdb.pdb_mrpipe_dir where user_pdb would be a pdb provided in ISPYB for that protein. Ouput files are of the kind:
  - final.pdb: MR solution with ligand placed
  - final.mtz: MTZ file of MR solution with ligand placed
  - new_ligand.cif: CIF file for ligand
MR using DIMPLE
- If the EDNAproc pipeline is successful and an input pdb (in same unit cell and SPG as actual sample) is provided for that protein (in ISPyB) by the user, then the DIMPLE molecular replacement pipeline (using refmac and phaser and findblobs) is automatically run and returns a refined model and difference density map pointing to unmodelled electron density blobs (i.e. potential ligand sites).
- Results can be found at : PROCESSED_DATA/your_image_directory_name/autoprocessing_imagePrefix_run#_#/EDNA_proc/results/dimple/

partners

European Synchrotron Radiation Facility - 71, avenue des Martyrs, CS 40220, 38043 Grenoble Cedex 9, France.