Skip to content

Processing MPCCD images by DIALS

biochem_fan edited this page Oct 25, 2016 · 17 revisions

Processing MPCCD Images by DIALS

MPCCD supports in DIALS is under development. This documents is for developers and experts!

Here, we will process a lysozyme dataset we processed with CrystFEL in earlier sections (CXIDB #33, runs 266711-266721).

First, read an excellent documentation "Processing XFEL data with cctbx.xfel and DIALS" in Computational Crystallography Newsletter 2016, July.

Necessary patches

We assume that you have built the latest version of DIALS and CCTBX from the respository. In addition, the following patches must be applied. I am waiting for these patches to be reviewed and merged.

Parameter optimizations

Import one of the HDF5 files from Cheetah and run spot finding on the first several images.

dials.import run266711-0.h5
dials.find_spots datablock.json min_spot_size=1 d_max=20 scan_range=1,10

Inspect spot finding results in dials.image_viewer. Detected spot centroids are marked by pink dots.

dials.image_viewer datablock.json strong.pickle

If the result is unsatisfactory, tweak min_spot_size (try 1, 2, 3) and global_threshold. The gain is 10, because the Cheetah pipeline normalizes 1 photon to be 10 counts in the output. You do not have to specify this in a command line argument, because it is automatically set by the dxtbx format class. Also check the resolution of the beam stop shadow.

The dxtbx class assumes that the images came from MPCCD in the experimental hutch 4 (BL3 EH4) and the detector distance was 50.0 mm. If not, you can override the initial geometry by MPCCD_DISTANCE and MPCCD_GEOMETRY environmental variables. TODO: Write this

1st integration

Prepare index.phil as follows:

# input.reference_geometry = ../refined_geometry.json

# you can use 14 cores in the serial queue
mp.nproc = 14

# get more info
verbosity = 3

indexing {
    known_symmetry {
        space_group = P43212
        unit_cell = 78.9 78.9 38.1 90 90 90
    }
    method = real_space_grid_search

    # sometimes fft1d works better
    # method = fft1d

    # needs to be sufficiently high
    refinement_protocol.d_min_start = 2.0

    # dramatically increases accepted spots in real_space_grid_search
    index_assignment.method = local
    optimise_initial_basis_vectors = True
}

spotfinder {
    filter.min_spot_size = 1

    # mask the beam stop
    filter.d_max = 20
}

refinement {
    parameterisation {
        # not sure which is better. The default is False
        spherical_relp_model = True

        beam.fix = all
        detector.fix_list = Dist,Tau1
        auto_reduction {
            action = fix
            min_nref_per_parameter = 1
        }
        crystal {
            unit_cell {
                restraints {
                    tie_to_target {
                        values = 78.9,78.9,38.1,90,90,90
                        sigmas = 1,1,1,0,0,0
                        apply_to_all = True
                    }
                }
            }
        }
    }
}

integration {
    integrator = stills
    profile.fitting = False
    background {
        algorithm = simple
        simple {
            model.algorithm = linear2d
            outlier.algorithm = tukey
        }
    }
}

profile {
    gaussian_rs {
        min_spots.overall = 0
    }   
}

Run dials.stills_process on first several runs. We use the following job submission script run-dials.sh.

#!/bin/bash
#PBS -l nodes=1:ppn=14
#PBS -q serial

if [ -n "$PBS_O_WORKDIR" -a "$PBS_ENVIRONMENT" != "PBS_INTERACTIVE" ]; then
        cd $PBS_O_WORKDIR
fi

source ~nureki/packages/dials/build/setpaths.sh

if [ $# -eq 1 ]; then
        TARGET=$1
fi

if [ -z "$TARGET" ]; then
        echo "Option: TARGET.h5 "
        exit 1
fi

echo `hostname` $TARGET 

mkdir $TARGET
cd $TARGET
MPCCD_DISTANCE=50.5 dials.stills_process ../index.phil ../../$TARGET/run$TARGET.h5

(We know the detector distance is 50.5 mm from our trials with CrystFEL)

And submit the job:

for i in {11..13}; do for j in {0..2}; do qsub -v TARGET=2667$i-$j run-dials.sh ; done; done

Note that this creates folders for each HDF5 file. Otherwise, we would have tens of thousands of files in a folder and suffer from huge file system overload!

Joint refinement (1st round)

As

find -name "int-*.pickle" | wc -l

shows, we could index only 1492 images out of 9470 images. This is because of the inaccurate initial metrology. We use cspad.cbf_metrology to optimise the geometry.

First, make metrology1 folder and create metrology.phil:

refinement {
    parameterisation {
        # not sure which is better. The default is False
        spherical_relp_model = True

        auto_reduction {
            action = fix
            min_nref_per_parameter = 1
        }
        crystal {
            unit_cell {
                restraints {
                    tie_to_target {
                        values = 78.9,78.9,38.1,90,90,90
                        sigmas = 1,1,1,0,0,0
                        apply_to_all = True
                    }
                }
            }
        }
    }
}

And run the refinement.

cspad.cbf_metrology ../2* ../metrology.phil tag=refine1 | tee refine1.log

It first refines the whole detector, then each sensor panel individually. After that, it tries to refine further levels and crashes. This is because CSPAD has one more level than MPCCD. We do not care.

The result was:

Detector 1 RMSDs by panel:
-----------------------------------------------------
| Panel | Nref  | RMSD_X  | RMSD_Y  | RMSD_DeltaPsi |
| id    |       | (px)    | (px)    | (deg)         |
-----------------------------------------------------
| 0     | 4119  | 1.0008  | 0.71287 | 0.054675      |
| 1     | 11411 | 0.54262 | 0.78264 | 0.078083      |
| 2     | 12583 | 0.55396 | 0.67085 | 0.080037      |
| 3     | 3991  | 0.97751 | 0.54345 | 0.052606      |
| 4     | 3040  | 1.0272  | 0.70569 | 0.051457      |
| 5     | 10849 | 0.57721 | 0.70723 | 0.0799        |
| 6     | 12502 | 0.51857 | 0.74374 | 0.079987      |
| 7     | 5419  | 0.9971  | 0.68209 | 0.056625      |
-----------------------------------------------------

2nd integration

We link metrology1/refine1_filtered_experiments_level1.json to the top directory and add the following line to index.phil to use the refined geometry instead of the hard-coded initial geometry.

input.reference_geometry = ../refine1_filtered_experiments_level1.json

Delete integration results from the initial trial or move it to metrology1 before running dials.stills_process again. We got 5636 indexed images out of 9470 images. This is a wonderful improvement!

Joint refinement (2nd round)

Now that we have many indexed images, we can take "half set" approach to examine the precision of the refinement.

cspad.cbf_metrology ../2* ../metrology.phil tag=refine2 split_dataset=True n_subset=5600 | tee refine.log

TODO: Write more...