Skip to content

Processing MPCCD images by DIALS

biochem_fan edited this page May 22, 2024 · 17 revisions

Processing MPCCD Images by DIALS

MPCCD support in DIALS is under development. This document is for developers and experts!

Here, we will process a lysozyme dataset we processed with CrystFEL in earlier sections (CXIDB #33, runs 266711-266721).

First, read an excellent documentation "Processing XFEL data with cctbx.xfel and DIALS" in Computational Crystallography Newsletter 2016, July.

We assume that you have built the latest version of DIALS and CCTBX from the respository.

Parameter optimizations

Import one of the HDF5 files from Cheetah and run spot finding on the first several images.

dials.import run266711-0.h5
dials.find_spots datablock.json min_spot_size=1 d_max=20 scan_range=1,10

Inspect spot finding results in dials.image_viewer. Detected spot centroids are marked by pink dots.

dials.image_viewer datablock.json strong.pickle

If the result is unsatisfactory, tweak min_spot_size (try 1, 2, 3) and global_threshold. The gain is 10, because the Cheetah pipeline normalizes 1 photon to be 10 counts in the output. You do not have to specify this in a command line argument, because it is automatically set by the dxtbx format class. Also check the resolution of the beam stop shadow.

The dxtbx class assumes that the images came from MPCCD in the experimental hutch 4 (BL3 EH4) and the detector distance was 50.0 mm. If not, you can override the initial geometry by MPCCD_DISTANCE and MPCCD_GEOMETRY environmental variables.

Importing CrystFEL geometry (optional)

If you have an optimized geometry from CrystFEL, you can convert it to a DIALS geometry JSON file.

dials.python `libtbx.show_dist_paths xfel`/sacla/mpccd_geom2json.py sacla-15jan-505-opt.geom distance=50.5

This generates geometry.json, which can be specified in the input.reference_geometry line of the input PHIL file (see below).

1st integration

Prepare index.phil as follows:

# uncomment below if you have converted the CrystFEL geometry file
# input.reference_geometry = geometry.json

# you can use up to 48 cores in the serial queue but I recommend using one third of it per job
mp.nproc = 16

# get more info
verbosity = 3

indexing {
    known_symmetry {
        space_group = P43212
        unit_cell = 78.9 78.9 38.1 90 90 90
    }
    method = fft1d

    # sometimes real_space_grid_search works better
    # method = real_space_grid_search

    # needs to be sufficiently high
    refinement_protocol.d_min_start = 2.0

    # dramatically increases accepted spots in real_space_grid_search
    # index_assignment.method = local
    # optimise_initial_basis_vectors = True
}

spotfinder {
    filter.min_spot_size = 1

    # mask the beam stop
    filter.d_max = 20
}

refinement {
    parameterisation {
        # not sure which is better. The default is False
        spherical_relp_model = True

        # As of 2024, these are no longer necessary; they are default.
        # beam.fix = all
        # detector.fix = all
        # auto_reduction {
        #     action = fix
        #    min_nref_per_parameter = 1
        # }

        crystal {
            unit_cell {
                restraints {
                    tie_to_target {
                        values = 78.9,78.9,38.1,90,90,90
                        sigmas = 1,1,1,0,0,0
                    }
                }
            }
        }
    }
}

# As of 2024, below are no longer necessary; they are default
# except outlier.algorithm (plane is the default).
# integration {
#    integrator = stills
#    profile.fitting = False
#    background {
#        algorithm = simple
#        simple {
#            model.algorithm = linear2d
#            outlier.algorithm = simple
#        }
#    }
#}
#
#profile {
#    gaussian_rs {
#        min_spots.overall = 0
#    }   
#}

Run dials.stills_process on first several runs. We use the following job submission script run-dials.sh.

#!/bin/bash
#PBS -l nodes=1:ppn=16
#PBS -l mem=50g
#PBS -q serial

if [ -n "$PBS_O_WORKDIR" -a "$PBS_ENVIRONMENT" != "PBS_INTERACTIVE" ]; then
        cd $PBS_O_WORKDIR
fi

source ~nureki/packages/dials/build/setpaths.sh

if [ $# -eq 1 ]; then
        TARGET=$1
fi

if [ -z "$TARGET" ]; then
        echo "Option: TARGET.h5 "
        exit 1
fi

echo `hostname` $TARGET 

mkdir $TARGET
cd $TARGET
MPCCD_DISTANCE=50.5 dials.stills_process ../index.phil ../../$TARGET/run$TARGET.h5

(We know the detector distance is 50.5 mm from our trials with CrystFEL)

And submit the job:

for i in {11..13}; do for j in {0..2}; do qsub -v TARGET=2667$i-$j run-dials.sh ; done; done

Note that this creates folders for each HDF5 file. Otherwise, we would have tens of thousands of files in a folder and suffer from huge file system overload!

Joint refinement (1st round)

As

find -name "int-*.pickle" | wc -l

shows, we could index only 1492 images out of 9470 images. This is because of the inaccurate initial metrology. We use cspad.cbf_metrology to optimise the geometry.

First, make metrology1 directory and move integration folders to there. Then create metrology.phil:

refinement {
    parameterisation {
        # not sure which is better. The default is False
        spherical_relp_model = True

        auto_reduction {
            action = fix
            min_nref_per_parameter = 1
        }
        crystal {
            unit_cell {
                restraints {
                    tie_to_target {
                        values = 78.9,78.9,38.1,90,90,90
                        sigmas = 1,1,1,0,0,0
                    }
                }
            }
        }
    }
}

And run the refinement.

cspad.cbf_metrology 2* metrology.phil tag=refine1 | tee refine1.log

It first refines the whole detector, then each sensor panel individually. After that, it tries to refine further levels and crashes. This is because CSPAD has one more level than MPCCD. We do not care.

The result was:

Detector 1 RMSDs by panel:
-----------------------------------------------------
| Panel | Nref  | RMSD_X  | RMSD_Y  | RMSD_DeltaPsi |
| id    |       | (px)    | (px)    | (deg)         |
-----------------------------------------------------
| 0     | 4119  | 1.0008  | 0.71287 | 0.054675      |
| 1     | 11411 | 0.54262 | 0.78264 | 0.078083      |
| 2     | 12583 | 0.55396 | 0.67085 | 0.080037      |
| 3     | 3991  | 0.97751 | 0.54345 | 0.052606      |
| 4     | 3040  | 1.0272  | 0.70569 | 0.051457      |
| 5     | 10849 | 0.57721 | 0.70723 | 0.0799        |
| 6     | 12502 | 0.51857 | 0.74374 | 0.079987      |
| 7     | 5419  | 0.9971  | 0.68209 | 0.056625      |
-----------------------------------------------------

2nd integration

We link metrology1/refine1_filtered_experiments_level1.json to the top directory and add the following line to index.phil to use the refined geometry instead of the hard-coded initial geometry.

input.reference_geometry = ../refine1_filtered_experiments_level1.json

Submit run-dials.sh again. We got 5636 indexed images out of 9470 images. This is a wonderful improvement!

Joint refinement (2nd round)

We repeat the refinement in the metrology2 directory.

cspad.cbf_metrology 2* metrology.phil tag=refine2 | tee refine.log

With more images, it will take longer (30 ~ 40 min). If you are impatient, you might want to add n_subset=3000 to work on randomly chosen 3000 images.

The RMSDs are now:

Detector 1 RMSDs by panel:
-----------------------------------------------------
| Panel | Nref  | RMSD_X  | RMSD_Y  | RMSD_DeltaPsi |
| id    |       | (px)    | (px)    | (deg)         |
-----------------------------------------------------
| 0     | 28098 | 0.77496 | 0.51935 | 0.04635       |
| 1     | 46759 | 0.38311 | 0.5291  | 0.080804      |
| 2     | 49719 | 0.43061 | 0.51712 | 0.081987      |
| 3     | 26497 | 0.80517 | 0.48677 | 0.046835      |
| 4     | 27313 | 0.82286 | 0.5261  | 0.045595      |
| 5     | 49468 | 0.43528 | 0.55105 | 0.081936      |
| 6     | 48705 | 0.39605 | 0.55419 | 0.08147       |
| 7     | 29181 | 0.79938 | 0.54972 | 0.047313      |
----------------------------------------------------

Note that the ratio of accepted spots in the outer panels (i.e. #0, 3, 4, 7) to those in the inner panels (i.e. #1, 2, 5, 6) are higher than before. This indicates that more high resolution spots were accepted because of better metrology.

Batch processing

Let's update index.phil to use metrology2/refine2_filtered_experiments_level1.json and run dials.stills_process on all images.

for i in {11..21}; do for j in {0..2}; do qsub -v TARGET=2667$i-$j run-dials.sh ; done; done

It indexed 12102 images out of 21039 images as confirmed by:

$ find 2* -name "*datablock*"|wc -l
21039
$ find 2* -name "int-*"|wc -l
12102

Merging with cxi.merge

TODO:

Merging with PRIME

Currently (as of Oct 2016) PRIME cannot handle multipanel detectors very well. Things that are affected are:

  • Positional restraints (Rxy term) used for unit-cell refinement
  • Anisotropic mosaicity and beam divergence

These problems have been reported to the developers.