-
Notifications
You must be signed in to change notification settings - Fork 2
Processing MPCCD images by DIALS
MPCCD support in DIALS is under development. This document is for developers and experts!
Here, we will process a lysozyme dataset we processed with CrystFEL in earlier sections (CXIDB #33, runs 266711-266721).
First, read an excellent documentation "Processing XFEL data with cctbx.xfel and DIALS" in Computational Crystallography Newsletter 2016, July.
We assume that you have built the latest version of DIALS and CCTBX from the respository.
Import one of the HDF5 files from Cheetah and run spot finding on the first several images.
dials.import run266711-0.h5
dials.find_spots datablock.json min_spot_size=1 d_max=20 scan_range=1,10
Inspect spot finding results in dials.image_viewer
. Detected spot centroids are marked by pink dots.
dials.image_viewer datablock.json strong.pickle
If the result is unsatisfactory, tweak min_spot_size
(try 1, 2, 3) and global_threshold
. The gain
is 10, because the Cheetah pipeline normalizes 1 photon to be 10 counts in the output. You do not have to specify this in a command line argument, because it is automatically set by the dxtbx
format class. Also check the resolution of the beam stop shadow.
The dxtbx
class assumes that the images came from MPCCD in the experimental hutch 4 (BL3 EH4) and the detector distance was 50.0 mm. If not, you can override the initial geometry by MPCCD_DISTANCE
and MPCCD_GEOMETRY
environmental variables.
If you have an optimized geometry from CrystFEL, you can convert it to a DIALS geometry JSON file.
dials.python `libtbx.show_dist_paths xfel`/sacla/mpccd_geom2json.py sacla-15jan-505-opt.geom distance=50.5
This generates geometry.json
, which can be specified in the input.reference_geometry
line of the input PHIL file (see below).
Prepare index.phil
as follows:
# uncomment below if you have converted the CrystFEL geometry file
# input.reference_geometry = geometry.json
# you can use up to 48 cores in the serial queue but I recommend using one third of it per job
mp.nproc = 16
# get more info
verbosity = 3
indexing {
known_symmetry {
space_group = P43212
unit_cell = 78.9 78.9 38.1 90 90 90
}
method = fft1d
# sometimes real_space_grid_search works better
# method = real_space_grid_search
# needs to be sufficiently high
refinement_protocol.d_min_start = 2.0
# dramatically increases accepted spots in real_space_grid_search
# index_assignment.method = local
# optimise_initial_basis_vectors = True
}
spotfinder {
filter.min_spot_size = 1
# mask the beam stop
filter.d_max = 20
}
refinement {
parameterisation {
# not sure which is better. The default is False
spherical_relp_model = True
# As of 2024, these are no longer necessary; they are default.
# beam.fix = all
# detector.fix = all
# auto_reduction {
# action = fix
# min_nref_per_parameter = 1
# }
crystal {
unit_cell {
restraints {
tie_to_target {
values = 78.9,78.9,38.1,90,90,90
sigmas = 1,1,1,0,0,0
}
}
}
}
}
}
# As of 2024, below are no longer necessary; they are default
# except outlier.algorithm (plane is the default).
# integration {
# integrator = stills
# profile.fitting = False
# background {
# algorithm = simple
# simple {
# model.algorithm = linear2d
# outlier.algorithm = simple
# }
# }
#}
#
#profile {
# gaussian_rs {
# min_spots.overall = 0
# }
#}
Run dials.stills_process
on first several runs. We use the following job submission script run-dials.sh
.
#!/bin/bash
#PBS -l nodes=1:ppn=16
#PBS -l mem=50g
#PBS -q serial
if [ -n "$PBS_O_WORKDIR" -a "$PBS_ENVIRONMENT" != "PBS_INTERACTIVE" ]; then
cd $PBS_O_WORKDIR
fi
source ~nureki/packages/dials/build/setpaths.sh
if [ $# -eq 1 ]; then
TARGET=$1
fi
if [ -z "$TARGET" ]; then
echo "Option: TARGET.h5 "
exit 1
fi
echo `hostname` $TARGET
mkdir $TARGET
cd $TARGET
MPCCD_DISTANCE=50.5 dials.stills_process ../index.phil ../../$TARGET/run$TARGET.h5
(We know the detector distance is 50.5 mm from our trials with CrystFEL)
And submit the job:
for i in {11..13}; do for j in {0..2}; do qsub -v TARGET=2667$i-$j run-dials.sh ; done; done
Note that this creates folders for each HDF5 file. Otherwise, we would have tens of thousands of files in a folder and suffer from huge file system overload!
As
find -name "int-*.pickle" | wc -l
shows, we could index only 1492 images out of 9470 images. This is because of the inaccurate initial metrology. We use cspad.cbf_metrology
to optimise the geometry.
First, make metrology1
directory and move integration folders to there. Then create metrology.phil
:
refinement {
parameterisation {
# not sure which is better. The default is False
spherical_relp_model = True
auto_reduction {
action = fix
min_nref_per_parameter = 1
}
crystal {
unit_cell {
restraints {
tie_to_target {
values = 78.9,78.9,38.1,90,90,90
sigmas = 1,1,1,0,0,0
}
}
}
}
}
}
And run the refinement.
cspad.cbf_metrology 2* metrology.phil tag=refine1 | tee refine1.log
It first refines the whole detector, then each sensor panel individually. After that, it tries to refine further levels and crashes. This is because CSPAD has one more level than MPCCD. We do not care.
The result was:
Detector 1 RMSDs by panel:
-----------------------------------------------------
| Panel | Nref | RMSD_X | RMSD_Y | RMSD_DeltaPsi |
| id | | (px) | (px) | (deg) |
-----------------------------------------------------
| 0 | 4119 | 1.0008 | 0.71287 | 0.054675 |
| 1 | 11411 | 0.54262 | 0.78264 | 0.078083 |
| 2 | 12583 | 0.55396 | 0.67085 | 0.080037 |
| 3 | 3991 | 0.97751 | 0.54345 | 0.052606 |
| 4 | 3040 | 1.0272 | 0.70569 | 0.051457 |
| 5 | 10849 | 0.57721 | 0.70723 | 0.0799 |
| 6 | 12502 | 0.51857 | 0.74374 | 0.079987 |
| 7 | 5419 | 0.9971 | 0.68209 | 0.056625 |
-----------------------------------------------------
We link metrology1/refine1_filtered_experiments_level1.json
to the top directory and add the following line to index.phil
to use the refined geometry instead of the hard-coded initial geometry.
input.reference_geometry = ../refine1_filtered_experiments_level1.json
Submit run-dials.sh
again. We got 5636 indexed images out of 9470 images. This is a wonderful improvement!
We repeat the refinement in the metrology2
directory.
cspad.cbf_metrology 2* metrology.phil tag=refine2 | tee refine.log
With more images, it will take longer (30 ~ 40 min). If you are impatient, you might want to add n_subset=3000
to work on randomly chosen 3000 images.
The RMSDs are now:
Detector 1 RMSDs by panel:
-----------------------------------------------------
| Panel | Nref | RMSD_X | RMSD_Y | RMSD_DeltaPsi |
| id | | (px) | (px) | (deg) |
-----------------------------------------------------
| 0 | 28098 | 0.77496 | 0.51935 | 0.04635 |
| 1 | 46759 | 0.38311 | 0.5291 | 0.080804 |
| 2 | 49719 | 0.43061 | 0.51712 | 0.081987 |
| 3 | 26497 | 0.80517 | 0.48677 | 0.046835 |
| 4 | 27313 | 0.82286 | 0.5261 | 0.045595 |
| 5 | 49468 | 0.43528 | 0.55105 | 0.081936 |
| 6 | 48705 | 0.39605 | 0.55419 | 0.08147 |
| 7 | 29181 | 0.79938 | 0.54972 | 0.047313 |
----------------------------------------------------
Note that the ratio of accepted spots in the outer panels (i.e. #0, 3, 4, 7) to those in the inner panels (i.e. #1, 2, 5, 6) are higher than before. This indicates that more high resolution spots were accepted because of better metrology.
Let's update index.phil
to use metrology2/refine2_filtered_experiments_level1.json
and run dials.stills_process
on all images.
for i in {11..21}; do for j in {0..2}; do qsub -v TARGET=2667$i-$j run-dials.sh ; done; done
It indexed 12102 images out of 21039 images as confirmed by:
$ find 2* -name "*datablock*"|wc -l
21039
$ find 2* -name "int-*"|wc -l
12102
TODO:
Currently (as of Oct 2016) PRIME cannot handle multipanel detectors very well. Things that are affected are:
- Positional restraints (Rxy term) used for unit-cell refinement
- Anisotropic mosaicity and beam divergence
These problems have been reported to the developers.