|
1 | 1 | # ADI MAX78000/MAX78002 Model Training and Synthesis
|
2 | 2 |
|
3 |
| -July 22, 2024 |
| 3 | +August 27, 2024 |
4 | 4 |
|
5 | 5 | **Note: This branch requires PyTorch 2. Please see the archive-1.8 branch for PyTorch 1.8 support. [KNOWN_ISSUES](KNOWN_ISSUES.txt) contains a list of known issues.**
|
6 | 6 |
|
@@ -1620,13 +1620,15 @@ When using the `-8` command line switch, all module outputs are quantized to 8-b
|
1620 | 1620 |
|
1621 | 1621 | The last layer can optionally use 32-bit output for increased precision. This is simulated by adding the parameter `wide=True` to the module function call.
|
1622 | 1622 |
|
1623 |
| -##### Weights: Quantization-Aware Training (QAT) |
| 1623 | +##### Weights and Activations: Quantization-Aware Training (QAT) |
1624 | 1624 |
|
1625 | 1625 | Quantization-aware training (QAT) is enabled by default. QAT is controlled by a policy file, specified by `--qat-policy`.
|
1626 | 1626 |
|
1627 |
| -* After `start_epoch` epochs, training will learn an additional parameter that corresponds to a shift of the final sum of products. |
| 1627 | +* After `start_epoch` epochs, an intermediate epoch with no backpropagation will be realized to collect activation statistics. Each layer's activation ranges will be determined based on the range & resolution trade-off from the collected activations. Then, QAT will start and an additional parameter (`output_shift`) will be learned to shift activations for compensating weights & biases scaling down. |
1628 | 1628 | * `weight_bits` describes the number of bits available for weights.
|
1629 | 1629 | * `overrides` allows specifying the `weight_bits` on a per-layer basis.
|
| 1630 | +* `outlier_removal_z_score` defines the z-score threshold for outlier removal during activation range calculation. (default: 8.0) |
| 1631 | +* `shift_quantile` defines the quantile of the parameters distribution to be used for the `output_shift` parameter. (default: 1.0) |
1630 | 1632 |
|
1631 | 1633 | By default, weights are quantized to 8-bits after 30 epochs as specified in `policies/qat_policy.yaml`. A more refined example that specifies weight sizes for individual layers can be seen in `policies/qat_policy_cifar100.yaml`.
|
1632 | 1634 |
|
@@ -1745,7 +1747,7 @@ For both approaches, the `quantize.py` software quantizes an existing PyTorch ch
|
1745 | 1747 |
|
1746 | 1748 | #### Quantization-Aware Training (QAT)
|
1747 | 1749 |
|
1748 |
| -Quantization-aware training is the better performing approach. It is enabled by default. QAT learns additional parameters during training that help with quantization (see [Weights: Quantization-Aware Training (QAT)](#weights-quantization-aware-training-qat). No additional arguments (other than input, output, and device) are needed for `quantize.py`. |
| 1750 | +Quantization-aware training is the better performing approach. It is enabled by default. QAT learns additional parameters during training that help with quantization (see [Weights and Activations: Quantization-Aware Training (QAT)](#weights-and-activations-quantization-aware-training-qat). No additional arguments (other than input, output, and device) are needed for `quantize.py`. |
1749 | 1751 |
|
1750 | 1752 | The input checkpoint to `quantize.py` is either `qat_best.pth.tar`, the best QAT epoch’s checkpoint, or `qat_checkpoint.pth.tar`, the final QAT epoch’s checkpoint.
|
1751 | 1753 |
|
@@ -2004,7 +2006,7 @@ The behavior of a training session might change when Quantization Aware Training
|
2004 | 2006 | While there can be multiple reasons for this, check two important settings that can influence the training behavior:
|
2005 | 2007 |
|
2006 | 2008 | * The initial learning rate may be set too high. Reduce LR by a factor of 10 or 100 by specifying a smaller initial `--lr` on the command line, and possibly by reducing the epoch `milestones` for further reduction of the learning rate in the scheduler file specified by `--compress`. Note that the the selected optimizer and the batch size both affect the learning rate.
|
2007 |
| -* The epoch when QAT is engaged may be set too low. Increase `start_epoch` in the QAT scheduler file specified by `--qat-policy`, and increase the total number of training epochs by increasing the value specified by the `--epochs` command line argument and by editing the `ending_epoch` in the scheduler file specified by `--compress`. *See also the rule of thumb discussed in the section [Weights: Quantization-Aware Training (QAT)](#weights:-auantization-aware-training \(qat\)).* |
| 2009 | +* The epoch when QAT is engaged may be set too low. Increase `start_epoch` in the QAT scheduler file specified by `--qat-policy`, and increase the total number of training epochs by increasing the value specified by the `--epochs` command line argument and by editing the `ending_epoch` in the scheduler file specified by `--compress`. *See also the rule of thumb discussed in the section [Weights and Activations: Quantization-Aware Training (QAT)](#weights-and-activations-quantization-aware-training-qat).* |
2008 | 2010 |
|
2009 | 2011 |
|
2010 | 2012 |
|
@@ -2209,6 +2211,7 @@ The following table describes the most important command line arguments for `ai8
|
2209 | 2211 | | `--no-unload` | Do not create the `cnn_unload()` function | |
|
2210 | 2212 | | `--no-kat` | Do not generate the `check_output()` function (disable known-answer test) | |
|
2211 | 2213 | | `--no-deduplicate-weights` | Do not deduplicate weights and and bias values | |
|
| 2214 | +| `--no-scale-output` | Do not use scales from the checkpoint to recover output range while generating `cnn_unload()` function | | |
2212 | 2215 |
|
2213 | 2216 | ### YAML Network Description
|
2214 | 2217 |
|
@@ -2330,6 +2333,12 @@ The following keywords are required for each `unload` list item:
|
2330 | 2333 | `width`: Data width (optional, defaults to 8) — either 8 or 32
|
2331 | 2334 | `write_gap`: Gap between data words (optional, defaults to 0)
|
2332 | 2335 |
|
| 2336 | +When `--no-scale-output` is not specified, scales from the checkpoint file are used to recover the output range. If there is a non-zero scale for the 8 bits output, the output will be scaled and kept in 16 bits. If the scale is zero, the output will be 8 bits. For 32 bits output, the output will be kept in 32 bits always. |
| 2337 | +
|
| 2338 | +Example: |
| 2339 | +
|
| 2340 | + |
| 2341 | +
|
2333 | 2342 | ##### `layers` (Mandatory)
|
2334 | 2343 |
|
2335 | 2344 | `layers` is a list that defines the per-layer description, as shown below:
|
@@ -2654,7 +2663,7 @@ Example:
|
2654 | 2663 | By default, the final layer is used as the output layer. Output layers are checked using the known-answer test, and they are copied from hardware memory when `cnn_unload()` is called. The tool also checks that output layer data isn’t overwritten by any later layers.
|
2655 | 2664 |
|
2656 | 2665 | When specifying `output: true`, any layer (or a combination of layers) can be used as an output layer.
|
2657 |
| -*Note:* When `unload:` is used, output layers are not used for generating `cnn_unload()`. |
| 2666 | +*Note:* When `--no-unload` is used, output layers are not used for generating `cnn_unload()`. |
2658 | 2667 |
|
2659 | 2668 | Example:
|
2660 | 2669 | `output: true`
|
|
0 commit comments