Add SLURM cluster support (cellpose 3.x backport) by karimi-ali · Pull Request #1444 · MouseLand/cellpose

karimi-ali · 2026-05-05T13:46:06Z

Add SLURM cluster support to `distributed_segmentation` (3.x backport)

Companion of the 4.x PR — see that one for the full description and
test context. This PR is the same slurmCluster patch applied to
the cellpose 3.x maintenance line.

Why a separate PR?

A user with a cellpose 3.x cyto-style model (CP_20250324_Nuc6) hit
the same SLURM-cluster pain point as the 4.x users (issue #1111).
The patch is intentionally cellpose-version-agnostic — it only
touches cellpose/contrib/distributed_segmentation.py and depends
on dask_jobqueue.SLURMCluster, not on cellpose internals — so the
same file works on both 3.x and 4.x. Making this a separate
maintenance-branch PR keeps the 3.x line usable without forcing a
4.x upgrade for users with legacy models.

Files

cellpose/contrib/distributed_segmentation.py — same patch as the
4.x PR.

What's in the patch (full description in the 4.x PR body):

slurmCluster class + cluster_type dispatch
gpus_per_job multi-GPU mode (with dask-cuda-worker shim)
resume_dir for walltime-recoverable runs
Memory-format split (dask "MB" string vs SLURM --mem integer)
SLURM-aware "release GPUs for stitching" branch
change_worker_attributes reliability fix
(scale(0) + sync(_correct_state), _job_kwargs direct update,
job-script preview log)
merge_all_boxes vectorized via argsort + reduceat
(was O(N) per group → O(N log N · ndim); fixed a stitching
wedge on volumes with ~10^7 unique labels). Correctness verified
on synthetic inputs (N=5e3 / N=5e4, 0 mismatches against the
legacy implementation) and end-to-end on an 8-block 512³
subvolume (51 356 final merged cells; same caveat about the
upstream to_zarr step being slow as in the 4.x PR).
overlap = int(diameter * 2) and face.ndim instead of hardcoded
3D structuring element

No other files changed. Docs in docs/distributed.rst updated only
in the 4.x PR (3.x docs may differ slightly; happy to mirror if the
maintainer wants).

Tested

End-to-end on MPCDF Raven cluster with cellpose 3.1.1.2 + the
CP_20250324_Nuc6 custom nuclei model on a 4518×5008×4560 uint16
X-ray volume.

…ackport) Mirror of the cellpose 4.x patch — same `slurmCluster` class, same `cluster_type` dispatch, same `resume_dir` mechanism, same memory-format split for dask vs SLURM, same bug fixes. The `slurmCluster` patch is intentionally cellpose-version-agnostic: it depends only on `dask_jobqueue.SLURMCluster`, not on cellpose internals. So the same `cellpose/contrib/distributed_segmentation.py` file works under both 3.x and 4.x once dropped over a 3.x install. Tested end-to-end on the MPCDF Raven cluster against a custom cellpose 3.x model (`CP_20250324_Nuc6`) on a 4518x5008x4560 uint16 X-ray nuclei volume. See companion PR against `main` (cellpose 4.x) for the full description.

(3.x backport — identical to the 4.x patch on slurm-distributed.) change_worker_attributes ======================== The previous implementation patched ``self.new_spec['options'][k] = v`` and called adapt(). On the GLC-07391_2 production run we observed via scontrol that newly spawned stitching jobs still carried the original GPU directives (cpu=18, mem=125000M, gres=gpu:a100:1) — the kwargs never made it onto the queued jobs. Two changes to make this reliable: * ``self.scale(0)`` is followed by ``self.sync(self._correct_state)`` so the cluster blocks until the existing GPU workers have actually left. Without this, adapt() can find a worker still in the spec and skip the respawn, leaving the run stuck against the original SLURM directives. * The kwargs are written into ``self._job_kwargs`` (the canonical store dask-jobqueue uses to render the job script) rather than just ``self.new_spec['options']``. We assert the two are still the same dict; if a future dask-jobqueue version breaks that invariant, the failure is loud rather than silent. The function now also prints the freshly-rendered SBATCH header for the next worker so the directives are visible in the driver log. merge_all_boxes =============== Was O(N) per unique id (per-id ``argwhere(==iii)``). For volumes with ~10^7 unique labels after stitching the quadratic blow-up wedged the final box-merge for hours. Replaced with a single ``argsort`` plus ``np.minimum/maximum.reduceat`` over (N, ndim) start/stop arrays — O(N log N * ndim). Verified bit-for-bit against the legacy implementation on synthetic inputs of (N=5e3, M=8e2) and (N=5e4, M=5e3); 0 mismatches in both regimes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

karimi-ali and others added 2 commits May 1, 2026 06:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SLURM cluster support (cellpose 3.x backport)#1444

Add SLURM cluster support (cellpose 3.x backport)#1444
karimi-ali wants to merge 2 commits into
MouseLand:mainfrom
karimi-ali:slurm-distributed-v3

karimi-ali commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

karimi-ali commented May 5, 2026

Add SLURM cluster support to distributed_segmentation (3.x backport)

Why a separate PR?

Files

Tested

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add SLURM cluster support to `distributed_segmentation` (3.x backport)