|
| 1 | +# Training the network |
| 2 | + |
| 3 | +Cellfinder includes a pretrained network for cell candidate classification. This will likely need to be retrained for different applications. Rather than generate training data blindly, the aim is to reduce the amount of hands-on time by only generating training data where cellfinder classified a cell candidiate incorrectly. |
| 4 | + |
| 5 | +## Generate training data |
| 6 | + |
| 7 | +To generate training data, you will need: |
| 8 | + |
| 9 | +* The cellfinder output file, `cell_classification.xml` \(but `cells.xml` can also work\). |
| 10 | +* The raw data used initially for cellfinder |
| 11 | + |
| 12 | +To generate training data for a single brain, use `cellfinder_curate`: |
| 13 | + |
| 14 | +```bash |
| 15 | +cellfinder_curate signal_images background_images cell_classification.xml |
| 16 | +``` |
| 17 | + |
| 18 | +### Arguments |
| 19 | + |
| 20 | +* Signal images |
| 21 | +* Background images |
| 22 | +* `cell_classification.xml` file |
| 23 | + |
| 24 | +{% hint style="info" %} |
| 25 | +You must also specify the pixel sizes, see [Specifying pixel size](../usage/specifying-pixel-size.md) |
| 26 | +{% endhint %} |
| 27 | + |
| 28 | +**Optional** |
| 29 | + |
| 30 | +* `-o` or `--output` Output directory for curation results. If this is not given, then the directory containing `cell_classification.xml` will be used. |
| 31 | +* `--symbol` Marker symbol \(Default: `ring`\) |
| 32 | +* `--marker-size` Marker size\(Default: `15`\) |
| 33 | +* `--opacity` Marker opacity \(Default: `0.6`\) |
| 34 | + |
| 35 | +A [napari](https://napari.org/) window will then open, showing two tabs on the left hand side: |
| 36 | + |
| 37 | +* `Image` Selecting this allows you to change the contrast limits, to better visualise cells |
| 38 | +* `Cell candidates` This shows the cell candidates than be curated. Cell |
| 39 | + |
| 40 | + candidates previously classified as cells are shown in yellow, and artifacts |
| 41 | + |
| 42 | + in blue. |
| 43 | + |
| 44 | +By selecting the `Cell candidates` tab and then the cell selecting tool \(arrow at the top\), cell candidates can be selected \(either individually, or many by dragging the cursor\). There are then four keyboard commands: |
| 45 | + |
| 46 | +* `C` Confirm the classification result, and add this to the training set |
| 47 | +* `T` Toggle the classification result \(i.e. change the classification\), |
| 48 | + |
| 49 | + and add this to the training set. |
| 50 | + |
| 51 | +* `Alt+Q` Save the results to an xml file |
| 52 | +* `Alt+E` Finish curating the training dataset. This will carry out three operations: |
| 53 | + * Extract cubes around these points, into two directories \(`cells` and `non_cells`\). |
| 54 | + * Generate a yaml file pointing to these files for use with `cellfinder_train` \(see below\) |
| 55 | + * Close the viewer |
| 56 | + |
| 57 | +Once a `yaml` file has been generated, you can proceed to training. However, it is likely useful to generate `yaml` files from additional datasets. |
| 58 | + |
| 59 | +## Start training |
| 60 | + |
| 61 | +You can then use these yaml files for training |
| 62 | + |
| 63 | +_N.B. If you have any yaml files from previous versions of cellfinder, they will continue to work, but are not documented here. Just use them as you would the files from `cellfinder_curate`._ |
| 64 | + |
| 65 | +```bash |
| 66 | +cellfinder_train -y yaml_1.yml yaml_2.yml -o /path/to/output/directory/ |
| 67 | +``` |
| 68 | + |
| 69 | +### Arguments |
| 70 | + |
| 71 | +* `-y` or `--yaml` The path to the yaml files defining training data |
| 72 | +* `-o` or `--output` Output directory for the trained model \(or model weights\) |
| 73 | + |
| 74 | + results |
| 75 | + |
| 76 | +**Optional** |
| 77 | + |
| 78 | +* `--continue-training` Continue training from an existing trained model. If no model or model weights are specified, this will continue from the included model. |
| 79 | +* `--trained-model` Path to a trained model to continue training |
| 80 | +* `--model-weights` Path to existing model weights to continue training |
| 81 | +* `--network-depth` Resnet depth \(based on [He et al. \(2015\)](https://arxiv.org/abs/1512.03385)\). Choose from |
| 82 | + |
| 83 | + \(18, 34, 50, 101 or 152\). In theory, a deeper network should classify better, |
| 84 | + |
| 85 | + at the expense of a larger model, and longer training time. Default: 50 |
| 86 | + |
| 87 | +* `--batch-size` Batch size for training \(how many cell candidates to process at once\). Default: 16 |
| 88 | +* `--epochs` How many times to use each sample for training. Default: 1000 |
| 89 | +* `--test-fraction` What fraction of data to keep for validation. Default: 0.1 |
| 90 | +* `--learning-rate` Learning rate for training the model |
| 91 | +* `--no-augment` Do not use data augmentation |
| 92 | +* `--save-weights` Only store the model weights, and not the full model. Useful to save storage space. |
| 93 | +* `--no-save-checkpoints` Do not save the model after each training epoch. Useful to save storage space, if you are happy to wait for the chosen number of epochs to complete. Each model file can be large, and if you don't have much training data, they can be generated quickly. |
| 94 | +* `--tensorboard` Log to `output_directory/tensorboard`. Use `tensorboard --logdir outputdirectory/tensorboard` to view. |
| 95 | +* `--save-progress` Save training progress to a .csv file \(`output_directory/training.csv`\). |
| 96 | + |
| 97 | +### Further help |
| 98 | + |
| 99 | +All `cellfinder_train` options can be found by running: |
| 100 | + |
| 101 | +```bash |
| 102 | +cellfinder_train -h |
| 103 | +``` |
| 104 | + |
0 commit comments