|
| 1 | +### Configuration file |
| 2 | + |
| 3 | +The parameters for generating, training and searching on the super-network are defined in a configuration file within two exclusive subsets of parameters for training and search: |
| 4 | +```json |
| 5 | + "bootstrapNAS": { |
| 6 | + "training": { |
| 7 | + ... |
| 8 | + }, |
| 9 | + "search": { |
| 10 | + ... |
| 11 | + } |
| 12 | + } |
| 13 | +``` |
| 14 | + |
| 15 | +In the `training` section, you specify the training algorithm, e.g., `progressive_shrinking`, schedule and elasticity parameters: |
| 16 | + |
| 17 | +```json |
| 18 | +"training": { |
| 19 | + "algorithm": "progressive_shrinking", |
| 20 | + "progressivity_of_elasticity": ["depth", "width"], |
| 21 | + "batchnorm_adaptation": { |
| 22 | + "num_bn_adaptation_samples": 1500 |
| 23 | + }, |
| 24 | + "schedule": { |
| 25 | + "list_stage_descriptions": [ |
| 26 | + {"train_dims": ["depth"], "epochs": 25, "depth_indicator": 1, "init_lr": 2.5e-6, "epochs_lr": 25}, |
| 27 | + {"train_dims": ["depth"], "epochs": 40, "depth_indicator": 2, "init_lr": 2.5e-6, "epochs_lr": 40}, |
| 28 | + {"train_dims": ["depth", "width"], "epochs": 50, "depth_indicator": 2, "reorg_weights": true, "width_indicator": 2, "bn_adapt": true, "init_lr": 2.5e-6, "epochs_lr": 50}, |
| 29 | + {"train_dims": ["depth", "width"], "epochs": 50, "depth_indicator": 2, "reorg_weights": true, "width_indicator": 3, "bn_adapt": true, "init_lr": 2.5e-6, "epochs_lr": 50} |
| 30 | + ] |
| 31 | + }, |
| 32 | + "elasticity": { |
| 33 | + "available_elasticity_dims": ["width", "depth"], |
| 34 | + "width": { |
| 35 | + "max_num_widths": 3, |
| 36 | + "min_width": 32, |
| 37 | + "width_step": 32, |
| 38 | + "width_multipliers": [1, 0.80, 0.60] |
| 39 | + }, |
| 40 | + ... |
| 41 | +} |
| 42 | + |
| 43 | +``` |
| 44 | +In the search section, you specify the search algorithm, e.g., `NSGA-II` and its parameters. For example: |
| 45 | +```json |
| 46 | +"search": { |
| 47 | + "algorithm": "NSGA2", |
| 48 | + "num_evals": 3000, |
| 49 | + "population": 50, |
| 50 | + "ref_acc": 93.65, |
| 51 | +} |
| 52 | +``` |
| 53 | + |
| 54 | +By default, BootstrapNAS uses `NSGA-II` (Dev et al., 2002), an genetic algorithm that constructs a pareto front of efficient sub-networks. |
| 55 | + |
| 56 | +List of parameters that can be used in the configuration file: |
| 57 | + |
| 58 | +**Training:** |
| 59 | + |
| 60 | +`algorithm`: Defines training strategy for tuning supernet. By default, `progressive_shrinking`. |
| 61 | + |
| 62 | +`progressivity_of_elasticity`: Defines the order of adding a new elasticity dimension from stage to stage. examples=["width", "depth", "kernel"]. |
| 63 | + |
| 64 | +`batchnorm_adaptation`: Specifies the number of samples from the training dataset to use for model inference during the BatchNorm statistics adaptation procedure for the compressed model. |
| 65 | + |
| 66 | +`schedule`: The schedule section includes a list of stage descriptors (`list_stage_descriptions`) that specify the elasticity dimensions enabled for a particular stage (`train_dims`), the number of `epochs` for the stage, the `depth_indicator` which in the case of elastic depth, restricts the maximum number of blocks in each independent group that can be skipped, the `width_indicator`, which restricts the maximum number of width values in each elastic layer. The user can also specify whether weights should be reorganized (`reorg_weights`), whether batch norm adaptation should be triggered at the beginning of the stage (`bn_adapt`), the initial learning rate for the stage (`init_lr`), and the epochs to use for adjusting the learning rate (`epochs_lr`). |
| 67 | + |
| 68 | +`elasticity`: Currently, BootstrapNAS supports three elastic dimensions (`kernel`, `width` and `depth`). The `mode` for elastic depth can be set as `auto` or `manual`. If manual is selected, the user can specify, a list of possible `skipped_blocks` that, as the name suggest, might be skipped. In `auto` mode, the user can specify the `min_block_size`, i.e., minimal number of operations in the skipping block, and the `max_block_size`, i.e., maximal number of operations in the block. The user can also `allow_nested_blocks` or `allow_linear_combination` of blocks. In the case of elastic width, the user can specify the `min_width`, i.e., the minimal number of output channels that can be activated for each layers with elastic width. Default value is 32, the `max_num_widths`, which restricts total number of different elastic width values for each layer, a `width_step`, which defines a step size for a generation of the elastic width search space, or a `width_multiplier` to define the elastic width search space via a list of multipliers. Finally, the user can determine the type of filter importance metric: L1, L2 or geometric mean. L2 is selected by default. For elastic kernel, the user can specify the `max_num_kernels`, which restricts the total number of different elastic kernel values for each layer. |
| 69 | + |
| 70 | +`train_steps`: Defines the number of samples used for each training epoch. |
| 71 | + |
| 72 | +**Search:** |
| 73 | + |
| 74 | +`algorithm`: Defines the search algorithm. The default algorithm is NSGA-II. |
| 75 | + |
| 76 | +`num_evals`: Defines the number of evaluations that will be used by the search algorithm. |
| 77 | + |
| 78 | +`population`: Defines the population size when using an evolutionary search algorithm. |
| 79 | + |
| 80 | +`acc_delta`: Defines the absolute difference in accuracy that is tolerated when looking for a subnetwork. |
| 81 | + |
| 82 | +`ref_acc`: Defines the reference accuracy from the pre-trained model used to generate the super-network. |
| 83 | + |
| 84 | +*A full list of the possible configuration parameters can be found [here](https://github.com/jpablomch/nncf_bootstrapnas/blob/develop/nncf/config/experimental_schema.py). |
0 commit comments