-
Notifications
You must be signed in to change notification settings - Fork 2
Processing MPCCD images by DIALS
MPCCD support in DIALS is under development. This document is for developers and experts!
Here, we will process a lysozyme dataset we processed with CrystFEL in earlier sections (CXIDB #33, runs 266711-266721).
First, read an excellent documentation "Processing XFEL data with cctbx.xfel and DIALS" in Computational Crystallography Newsletter 2016, July.
We assume that you have built the latest version of DIALS and CCTBX from the respository.
Import one of the HDF5 files from Cheetah and run spot finding on the first several images.
dials.import run266711-0.h5
dials.find_spots datablock.json min_spot_size=1 d_max=20 scan_range=1,10
Inspect spot finding results in dials.image_viewer
. Detected spot centroids are marked by pink dots.
dials.image_viewer datablock.json strong.pickle
If the result is unsatisfactory, tweak min_spot_size
(try 1, 2, 3) and global_threshold
. The gain
is 10, because the Cheetah pipeline normalizes 1 photon to be 10 counts in the output. You do not have to specify this in a command line argument, because it is automatically set by the dxtbx
format class. Also check the resolution of the beam stop shadow.
The dxtbx
class assumes that the images came from MPCCD in the experimental hutch 4 (BL3 EH4) and the detector distance was 50.0 mm. If not, you can override the initial geometry by MPCCD_DISTANCE
and MPCCD_GEOMETRY
environmental variables. TODO: Write this
Prepare index.phil
as follows:
# input.reference_geometry = ../refined_geometry.json
# you can use 14 cores in the serial queue
mp.nproc = 14
# get more info
verbosity = 3
indexing {
known_symmetry {
space_group = P43212
unit_cell = 78.9 78.9 38.1 90 90 90
}
method = real_space_grid_search
# sometimes fft1d works better
# method = fft1d
# needs to be sufficiently high
refinement_protocol.d_min_start = 2.0
# dramatically increases accepted spots in real_space_grid_search
index_assignment.method = local
optimise_initial_basis_vectors = True
}
spotfinder {
filter.min_spot_size = 1
# mask the beam stop
filter.d_max = 20
}
refinement {
parameterisation {
# not sure which is better. The default is False
spherical_relp_model = True
beam.fix = all
detector.fix_list = Dist,Tau1
auto_reduction {
action = fix
min_nref_per_parameter = 1
}
crystal {
unit_cell {
restraints {
tie_to_target {
values = 78.9,78.9,38.1,90,90,90
sigmas = 1,1,1,0,0,0
apply_to_all = True
}
}
}
}
}
}
integration {
integrator = stills
profile.fitting = False
background {
algorithm = simple
simple {
model.algorithm = linear2d
outlier.algorithm = tukey
}
}
}
profile {
gaussian_rs {
min_spots.overall = 0
}
}
Run dials.stills_process
on first several runs. We use the following job submission script run-dials.sh
.
#!/bin/bash
#PBS -l nodes=1:ppn=14
#PBS -q serial
if [ -n "$PBS_O_WORKDIR" -a "$PBS_ENVIRONMENT" != "PBS_INTERACTIVE" ]; then
cd $PBS_O_WORKDIR
fi
source ~nureki/packages/dials/build/setpaths.sh
if [ $# -eq 1 ]; then
TARGET=$1
fi
if [ -z "$TARGET" ]; then
echo "Option: TARGET.h5 "
exit 1
fi
echo `hostname` $TARGET
mkdir $TARGET
cd $TARGET
MPCCD_DISTANCE=50.5 dials.stills_process ../index.phil ../../$TARGET/run$TARGET.h5
(We know the detector distance is 50.5 mm from our trials with CrystFEL)
And submit the job:
for i in {11..13}; do for j in {0..2}; do qsub -v TARGET=2667$i-$j run-dials.sh ; done; done
Note that this creates folders for each HDF5 file. Otherwise, we would have tens of thousands of files in a folder and suffer from huge file system overload!
As
find -name "int-*.pickle" | wc -l
shows, we could index only 1492 images out of 9470 images. This is because of the inaccurate initial metrology. We use cspad.cbf_metrology
to optimise the geometry.
First, make metrology1
directory and move integration folders to there. Then create metrology.phil
:
refinement {
parameterisation {
# not sure which is better. The default is False
spherical_relp_model = True
auto_reduction {
action = fix
min_nref_per_parameter = 1
}
crystal {
unit_cell {
restraints {
tie_to_target {
values = 78.9,78.9,38.1,90,90,90
sigmas = 1,1,1,0,0,0
apply_to_all = True
}
}
}
}
}
}
And run the refinement.
cspad.cbf_metrology 2* metrology.phil tag=refine1 | tee refine1.log
It first refines the whole detector, then each sensor panel individually. After that, it tries to refine further levels and crashes. This is because CSPAD has one more level than MPCCD. We do not care.
The result was:
Detector 1 RMSDs by panel:
-----------------------------------------------------
| Panel | Nref | RMSD_X | RMSD_Y | RMSD_DeltaPsi |
| id | | (px) | (px) | (deg) |
-----------------------------------------------------
| 0 | 4119 | 1.0008 | 0.71287 | 0.054675 |
| 1 | 11411 | 0.54262 | 0.78264 | 0.078083 |
| 2 | 12583 | 0.55396 | 0.67085 | 0.080037 |
| 3 | 3991 | 0.97751 | 0.54345 | 0.052606 |
| 4 | 3040 | 1.0272 | 0.70569 | 0.051457 |
| 5 | 10849 | 0.57721 | 0.70723 | 0.0799 |
| 6 | 12502 | 0.51857 | 0.74374 | 0.079987 |
| 7 | 5419 | 0.9971 | 0.68209 | 0.056625 |
-----------------------------------------------------
We link metrology1/refine1_filtered_experiments_level1.json
to the top directory and add the following line to index.phil
to use the refined geometry instead of the hard-coded initial geometry.
input.reference_geometry = ../refine1_filtered_experiments_level1.json
Submit run-dials.sh
again. We got 5636 indexed images out of 9470 images. This is a wonderful improvement!
We repeat the refinement in the metrology2
directory.
cspad.cbf_metrology 2* metrology.phil tag=refine2 | tee refine.log
With more images, it will take longer (30 ~ 40 min). If you are impatient, you might want to add n_subset=3000
to work on randomly chosen 3000 images.
The RMSDs are now:
Detector 1 RMSDs by panel:
-----------------------------------------------------
| Panel | Nref | RMSD_X | RMSD_Y | RMSD_DeltaPsi |
| id | | (px) | (px) | (deg) |
-----------------------------------------------------
| 0 | 28098 | 0.77496 | 0.51935 | 0.04635 |
| 1 | 46759 | 0.38311 | 0.5291 | 0.080804 |
| 2 | 49719 | 0.43061 | 0.51712 | 0.081987 |
| 3 | 26497 | 0.80517 | 0.48677 | 0.046835 |
| 4 | 27313 | 0.82286 | 0.5261 | 0.045595 |
| 5 | 49468 | 0.43528 | 0.55105 | 0.081936 |
| 6 | 48705 | 0.39605 | 0.55419 | 0.08147 |
| 7 | 29181 | 0.79938 | 0.54972 | 0.047313 |
----------------------------------------------------
Note that the ratio of accepted spots in the outer panels (i.e. #0, 3, 4, 7) to those in the inner panels (i.e. #1, 2, 5, 6) are higher than before. This indicates that more high resolution spots were accepted because of better metrology.
Let's update index.phil
to use metrology2/refine2_filtered_experiments_level1.json
and run dials.stills_process
on all images.
for i in {11..21}; do for j in {0..2}; do qsub -v TARGET=2667$i-$j run-dials.sh ; done; done
It indexed 12102 images out of 21039 images as confirmed by:
$ find 2* -name "*datablock*"|wc -l
21039
$ find 2* -name "int-*"|wc -l
12102
TODO:
Currently PRIME cannot handle multipanel detectors very well. Things that are affected are:
- Positional restraints (Rxy term) used for unit-cell refinement
- anisotropic mosaicity and beam divergence
TODO: