Skip to content

Commit 549126d

Browse files
Merge pull request #1108 from GFleishman/main
Documented using masks and preprocessing_steps in distributed rst
2 parents e0be83b + 4d30972 commit 549126d

File tree

1 file changed

+117
-5
lines changed

1 file changed

+117
-5
lines changed

docs/distributed.rst

Lines changed: 117 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,30 @@ of the whole dataset.
1111

1212
Built to run on workstations or clusters. Blocks can be run in parallel, in series, or both.
1313
Compute resources (GPUs, CPUs, and RAM) can be arbitrarily partitioned for parallel computing.
14-
Currently workstations and LSF clusters are supported. SLURM clusters are
14+
Currently workstations and LSF clusters are supported. SLURM clusters are
1515
an easy addition - if you need this to run on a SLURM cluster `please post a feature request issue
1616
to the github repository <https://github.com/MouseLand/cellpose/issues>`_ and tag @GFleishman.
1717

18-
The input data format must be a zarr array. Some functions are provided in the module to help
19-
convert your data to a zarr array, but not all formats or situations are covered. These are
20-
good opportunities to submit pull requests. Currently, the module must be run via the Python API,
21-
but making it available in the GUI is another good PR or feature request.
18+
The input data format must be a `zarr array <https://zarr.readthedocs.io/en/stable/>`_.
19+
Some functions are provided in the module to help convert your data to a zarr array, but
20+
not all formats or situations are covered. These are good opportunities to submit pull requests.
21+
Currently, the module must be run via the Python API, but making it available in the GUI
22+
is another good PR or feature request.
23+
24+
Many images contain large volumes of background - i.e. parts of the image that do not contain
25+
sample. It would be a waste of resources to try and segment in these areas. Distributed Cellpose can
26+
take a foreground mask and will only process blocks that contain foreground. The mask does not have
27+
to be the same sampling rate (resolution) as the input data. It is only assumed that the input data
28+
and mask have the same field of view. So you can give a huge image and a small mask as long as the
29+
physical length of the axes (number of voxels times the voxel size in physical units) is the same.
30+
31+
Preprocessing (for example Gaussian smoothing) can sometimes improve Cellpose performance. But
32+
preprocessing big data can be inconvenient. Distributed Cellpose can take a list of
33+
preprocessing steps, which is a list of functions and their arguments, and it will run those
34+
functions on the blocks before running Cellpose. This distributes any preprocessing steps you
35+
like along with Cellpose itself. This can be used in very creative ways. For example,
36+
currently to perform multi-channel segmentation, you must use preprocessing_steps to provide
37+
the second channel. See the examples below to learn how to do multi-channel segmentation.
2238

2339
All user facing functions in the module have verbose docstrings that explain inputs and outputs.
2440
You can access these docstrings like this:
@@ -121,6 +137,7 @@ Wrap a folder of tiff images/tiles into a single zarr array without duplicating
121137
122138
# Note tiff filenames must indicate the position of each file in the overall tile grid
123139
from cellpose.contrib.distributed_segmentation import wrap_folder_of_tiffs
140+
124141
reconstructed_virtual_zarr_array = wrap_folder_of_tiffs(
125142
filname_pattern='/path/to/folder/of/*.tiff',
126143
block_index_pattern=r'_(Z)(\d+)(Y)(\d+)(X)(\d+)',
@@ -163,3 +180,98 @@ Run distributed Cellpose on an LSF cluster with 128 GPUs (e.g. Janelia cluster):
163180
cluster_kwargs=cluster_kwargs,
164181
)
165182
183+
184+
Use preprocessing_steps and a mask:
185+
186+
.. code-block:: python
187+
188+
from scipy.ndimage import gaussian_filter
189+
from cellpose.contrib.distributed_segmentation import distributed_eval
190+
191+
# parameterize cellpose however you like
192+
model_kwargs = {'gpu':True, 'model_type':'cyto3'} # can also use 'pretrained_model'
193+
eval_kwargs = {'diameter':30,
194+
'z_axis':0,
195+
'channels':[0,0],
196+
'do_3D':True,
197+
}
198+
199+
# define compute resources for local workstation
200+
cluster_kwargs = {
201+
'n_workers':1, # if you only have 1 gpu, then 1 worker is the right choice
202+
'ncpus':8,
203+
'memory_limit':'64GB',
204+
'threads_per_worker':1,
205+
}
206+
207+
# create preprocessing_steps
208+
# Note : for any pp step, the first parameter must be image and the last must be crop
209+
# you can have any number of other parameters in between them
210+
def pp_step_one(image, sigma, crop):
211+
return gaussian_filter(image, sigma)
212+
213+
# You can sneak other big datasets into the distribution through pp steps
214+
# the crop parameter contains the slices you need to get the correct block
215+
def pp_step_two(image, crop):
216+
return image - background_channel_zarr[crop] # make sure the other dataset is also zarr
217+
218+
# finally, put all preprocessing steps together
219+
preprocessing_steps = [(pp_step_one, {'sigma':2.0}), (pp_step_two, {}),]
220+
221+
# run segmentation
222+
# outputs:
223+
# segments: zarr array containing labels
224+
# boxes: list of bounding boxes around all labels (very useful for navigating big data)
225+
segments, boxes = distributed_eval(
226+
input_zarr=large_zarr_array,
227+
blocksize=(256, 256, 256),
228+
write_path='/where/zarr/array/containing/results/will/be/written.zarr',
229+
preprocessing_steps=preprocessing_steps,
230+
mask=mask,
231+
model_kwargs=model_kwargs,
232+
eval_kwargs=eval_kwargs,
233+
cluster_kwargs=cluster_kwargs,
234+
)
235+
236+
237+
Multi-channel segmentation using preprocessing_steps:
238+
239+
.. code-block:: python
240+
241+
from cellpose.contrib.distributed_segmentation import distributed_eval
242+
243+
# parameterize cellpose however you like
244+
model_kwargs = {'gpu':True, 'model_type':'cyto3'} # can also use 'pretrained_model'
245+
eval_kwargs = {'diameter':30,
246+
'z_axis':0,
247+
'channels':[2,1], # two channels along first axis
248+
'do_3D':True,
249+
}
250+
251+
# define compute resources for local workstation
252+
cluster_kwargs = {
253+
'n_workers':1, # if you only have 1 gpu, then 1 worker is the right choice
254+
'ncpus':8,
255+
'memory_limit':'64GB',
256+
'threads_per_worker':1,
257+
}
258+
259+
# preprocessing step to stack second channel onto first
260+
def stack_channels(image, crop):
261+
return np.stack((image, second_channel_zarr[crop]), axis=1) # second channel is also a zarr array
262+
preprocessing_steps = [(stack_channels, {}),]
263+
264+
# run segmentation
265+
# outputs:
266+
# segments: zarr array containing labels
267+
# boxes: list of bounding boxes around all labels (very useful for navigating big data)
268+
segments, boxes = distributed_eval(
269+
input_zarr=large_zarr_array,
270+
blocksize=(256, 256, 256),
271+
write_path='/where/zarr/array/containing/results/will/be/written.zarr',
272+
preprocessing_steps=preprocessing_steps, # sneaky multi-channel segmentation
273+
model_kwargs=model_kwargs,
274+
eval_kwargs=eval_kwargs,
275+
cluster_kwargs=cluster_kwargs,
276+
)
277+

0 commit comments

Comments
 (0)