Skip to content

Commit 6183d00

Browse files
oguzhanbsolakMaximGorkemrotx-eva
authored
QATv2 Changes, Unload Support for Multi Output Layers (#354)
* HPTQ Changes, Unload Support for Multi Output Layers * Ckpts for regression test, final_scale support for layer sharing * Update README.pdf * New ckpt for effnet2 * Change "threshold" name to "activation_threshold" * Default true for scale output * Key fix for effnetv2 --------- Co-authored-by: Gorkem Ulkar <[email protected]> Co-authored-by: Robert Muchsel <[email protected]>
1 parent ab4f4c0 commit 6183d00

22 files changed

+244
-74
lines changed

README.md

+15-6
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# ADI MAX78000/MAX78002 Model Training and Synthesis
22

3-
July 22, 2024
3+
August 27, 2024
44

55
**Note: This branch requires PyTorch 2. Please see the archive-1.8 branch for PyTorch 1.8 support. [KNOWN_ISSUES](KNOWN_ISSUES.txt) contains a list of known issues.**
66

@@ -1620,13 +1620,15 @@ When using the `-8` command line switch, all module outputs are quantized to 8-b
16201620
16211621
The last layer can optionally use 32-bit output for increased precision. This is simulated by adding the parameter `wide=True` to the module function call.
16221622
1623-
##### Weights: Quantization-Aware Training (QAT)
1623+
##### Weights and Activations: Quantization-Aware Training (QAT)
16241624
16251625
Quantization-aware training (QAT) is enabled by default. QAT is controlled by a policy file, specified by `--qat-policy`.
16261626
1627-
* After `start_epoch` epochs, training will learn an additional parameter that corresponds to a shift of the final sum of products.
1627+
* After `start_epoch` epochs, an intermediate epoch with no backpropagation will be realized to collect activation statistics. Each layer's activation ranges will be determined based on the range & resolution trade-off from the collected activations. Then, QAT will start and an additional parameter (`output_shift`) will be learned to shift activations for compensating weights & biases scaling down.
16281628
* `weight_bits` describes the number of bits available for weights.
16291629
* `overrides` allows specifying the `weight_bits` on a per-layer basis.
1630+
* `outlier_removal_z_score` defines the z-score threshold for outlier removal during activation range calculation. (default: 8.0)
1631+
* `shift_quantile` defines the quantile of the parameters distribution to be used for the `output_shift` parameter. (default: 1.0)
16301632

16311633
By default, weights are quantized to 8-bits after 30 epochs as specified in `policies/qat_policy.yaml`. A more refined example that specifies weight sizes for individual layers can be seen in `policies/qat_policy_cifar100.yaml`.
16321634

@@ -1745,7 +1747,7 @@ For both approaches, the `quantize.py` software quantizes an existing PyTorch ch
17451747
17461748
#### Quantization-Aware Training (QAT)
17471749
1748-
Quantization-aware training is the better performing approach. It is enabled by default. QAT learns additional parameters during training that help with quantization (see [Weights: Quantization-Aware Training (QAT)](#weights-quantization-aware-training-qat). No additional arguments (other than input, output, and device) are needed for `quantize.py`.
1750+
Quantization-aware training is the better performing approach. It is enabled by default. QAT learns additional parameters during training that help with quantization (see [Weights and Activations: Quantization-Aware Training (QAT)](#weights-and-activations-quantization-aware-training-qat). No additional arguments (other than input, output, and device) are needed for `quantize.py`.
17491751
17501752
The input checkpoint to `quantize.py` is either `qat_best.pth.tar`, the best QAT epoch’s checkpoint, or `qat_checkpoint.pth.tar`, the final QAT epoch’s checkpoint.
17511753
@@ -2004,7 +2006,7 @@ The behavior of a training session might change when Quantization Aware Training
20042006
While there can be multiple reasons for this, check two important settings that can influence the training behavior:
20052007
20062008
* The initial learning rate may be set too high. Reduce LR by a factor of 10 or 100 by specifying a smaller initial `--lr` on the command line, and possibly by reducing the epoch `milestones` for further reduction of the learning rate in the scheduler file specified by `--compress`. Note that the the selected optimizer and the batch size both affect the learning rate.
2007-
* The epoch when QAT is engaged may be set too low. Increase `start_epoch` in the QAT scheduler file specified by `--qat-policy`, and increase the total number of training epochs by increasing the value specified by the `--epochs` command line argument and by editing the `ending_epoch` in the scheduler file specified by `--compress`. *See also the rule of thumb discussed in the section [Weights: Quantization-Aware Training (QAT)](#weights:-auantization-aware-training \(qat\)).*
2009+
* The epoch when QAT is engaged may be set too low. Increase `start_epoch` in the QAT scheduler file specified by `--qat-policy`, and increase the total number of training epochs by increasing the value specified by the `--epochs` command line argument and by editing the `ending_epoch` in the scheduler file specified by `--compress`. *See also the rule of thumb discussed in the section [Weights and Activations: Quantization-Aware Training (QAT)](#weights-and-activations-quantization-aware-training-qat).*
20082010
20092011
20102012
@@ -2209,6 +2211,7 @@ The following table describes the most important command line arguments for `ai8
22092211
| `--no-unload` | Do not create the `cnn_unload()` function | |
22102212
| `--no-kat` | Do not generate the `check_output()` function (disable known-answer test) | |
22112213
| `--no-deduplicate-weights` | Do not deduplicate weights and and bias values | |
2214+
| `--no-scale-output` | Do not use scales from the checkpoint to recover output range while generating `cnn_unload()` function | |
22122215
22132216
### YAML Network Description
22142217
@@ -2330,6 +2333,12 @@ The following keywords are required for each `unload` list item:
23302333
`width`: Data width (optional, defaults to 8) — either 8 or 32
23312334
`write_gap`: Gap between data words (optional, defaults to 0)
23322335
2336+
When `--no-scale-output` is not specified, scales from the checkpoint file are used to recover the output range. If there is a non-zero scale for the 8 bits output, the output will be scaled and kept in 16 bits. If the scale is zero, the output will be 8 bits. For 32 bits output, the output will be kept in 32 bits always.
2337+
2338+
Example:
2339+
2340+
![Unload Array](docs/unload_example.png)
2341+
23332342
##### `layers` (Mandatory)
23342343
23352344
`layers` is a list that defines the per-layer description, as shown below:
@@ -2654,7 +2663,7 @@ Example:
26542663
By default, the final layer is used as the output layer. Output layers are checked using the known-answer test, and they are copied from hardware memory when `cnn_unload()` is called. The tool also checks that output layer data isn’t overwritten by any later layers.
26552664
26562665
When specifying `output: true`, any layer (or a combination of layers) can be used as an output layer.
2657-
*Note:* When `unload:` is used, output layers are not used for generating `cnn_unload()`.
2666+
*Note:* When `--no-unload` is used, output layers are not used for generating `cnn_unload()`.
26582667
26592668
Example:
26602669
`output: true`

README.pdf

-2.12 MB
Binary file not shown.

docs/unload_example.png

64.8 KB
Loading

gen-demos-max78000.sh

+4-1
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,12 @@ python ai8xize.py --test-dir $TARGET --prefix cifar-100-simplewide2x-mixed --che
1212
python ai8xize.py --test-dir $TARGET --prefix cifar-100-residual --checkpoint-file trained/ai85-cifar100-residual-qat8-q.pth.tar --config-file networks/cifar100-ressimplenet.yaml --softmax $COMMON_ARGS --boost 2.5 "$@"
1313
python ai8xize.py --test-dir $TARGET --prefix kws20_v3 --checkpoint-file trained/ai85-kws20_v3-qat8-q.pth.tar --config-file networks/kws20-v3-hwc.yaml --softmax $COMMON_ARGS "$@"
1414
python ai8xize.py --test-dir $TARGET --prefix kws20_nas --checkpoint-file trained/ai85-kws20_nas-qat8-q.pth.tar --config-file networks/kws20-nas-hwc.yaml --softmax $COMMON_ARGS "$@"
15-
python ai8xize.py --test-dir $TARGET --prefix faceid --checkpoint-file trained/ai85-faceid-qat8-q.pth.tar --config-file networks/faceid.yaml --fifo $COMMON_ARGS "$@"
15+
python izer/add_fake_passthrough.py --input-checkpoint-path trained/ai85-faceid_112-qat-q.pth.tar --output-checkpoint-path trained/ai85-fakepass-faceid_112-qat-q.pth.tar --layer-name fakepass --layer-depth 128 --layer-name-after-pt linear --low-memory-footprint "$@"
16+
python ai8xize.py --test-dir $TARGET --prefix faceid_112 --checkpoint-file trained/ai85-fakepass-faceid_112-qat-q.pth.tar --config-file networks/ai85-faceid_112.yaml --fifo $COMMON_ARGS "$@"
1617
python ai8xize.py --test-dir $TARGET --prefix cats-dogs --checkpoint-file trained/ai85-catsdogs-qat8-q.pth.tar --config-file networks/cats-dogs-hwc.yaml --fifo --softmax $COMMON_ARGS "$@"
18+
python izer/add_fake_passthrough.py --input-checkpoint-path trained/ai85-camvid-unet-large-q.pth.tar --output-checkpoint-path trained/ai85-camvid-unet-large-fakept-q.pth.tar --layer-name pt --layer-depth 56 --layer-name-after-pt upconv3 "$@"
1719
python ai8xize.py --test-dir $TARGET --prefix camvid_unet --checkpoint-file trained/ai85-camvid-unet-large-fakept-q.pth.tar --config-file networks/camvid-unet-large-fakept.yaml $COMMON_ARGS --overlap-data --mlator --no-unload --max-checklines 8192 --new-kernel-loader "$@"
20+
python izer/add_fake_passthrough.py --input-checkpoint-path trained/ai85-aisegment-unet-large-q.pth.tar --output-checkpoint-path trained/ai85-aisegment-unet-large-fakept-q.pth.tar --layer-name pt --layer-depth 56 --layer-name-after-pt upconv3 "$@"
1821
python ai8xize.py --test-dir $TARGET --prefix aisegment_unet --checkpoint-file trained/ai85-aisegment-unet-large-fakept-q.pth.tar --config-file networks/aisegment-unet-large-fakept.yaml $COMMON_ARGS --overlap-data --mlator --no-unload --max-checklines 8192 --new-kernel-loader "$@"
1922
python ai8xize.py --test-dir $TARGET --prefix svhn_tinierssd --checkpoint-file trained/ai85-svhn-tinierssd-qat8-q.pth.tar --config-file networks/svhn-tinierssd.yaml --overlap-data $COMMON_ARGS "$@"
2023
python ai8xize.py --test-dir $TARGET --prefix facedet_tinierssd --checkpoint-file trained/ai85-facedet-tinierssd-qat8-q.pth.tar --config-file networks/ai85-facedet-tinierssd.yaml --sample-input tests/sample_vggface2_facedetection.npy --fifo $COMMON_ARGS "$@"

gen-demos-max78002.sh

+5-3
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,16 @@ python ai8xize.py --test-dir $TARGET --prefix cifar-100-simplewide2x-mixed --che
1212
python ai8xize.py --test-dir $TARGET --prefix cifar-100-residual --checkpoint-file trained/ai85-cifar100-residual-qat8-q.pth.tar --config-file networks/cifar100-ressimplenet.yaml --softmax $COMMON_ARGS "$@"
1313
python ai8xize.py --test-dir $TARGET --prefix kws20_v3_1 --checkpoint-file trained/ai87-kws20_v3-qat8-q.pth.tar --config-file networks/ai87-kws20-v3-hwc.yaml --softmax $COMMON_ARGS "$@"
1414
python ai8xize.py --test-dir $TARGET --prefix kws20_v2_1 --checkpoint-file trained/ai87-kws20_v2-qat8-q.pth.tar --config-file networks/ai87-kws20-v2-hwc.yaml --softmax $COMMON_ARGS "$@"
15-
python ai8xize.py --test-dir $TARGET --prefix faceid --checkpoint-file trained/ai85-faceid-qat8-q.pth.tar --config-file networks/faceid.yaml --fifo $COMMON_ARGS "$@"
15+
python ai8xize.py --test-dir $TARGET --prefix mobilefacenet-112 --checkpoint-file trained/ai87-mobilefacenet-112-qat-q.pth.tar --config-file networks/ai87-mobilefacenet-112.yaml --fifo $COMMON_ARGS "$@"
1616
python ai8xize.py --test-dir $TARGET --prefix cats-dogs --checkpoint-file trained/ai85-catsdogs-qat8-q.pth.tar --config-file networks/cats-dogs-hwc-no-fifo.yaml --softmax $COMMON_ARGS "$@"
17+
python izer/add_fake_passthrough.py --input-checkpoint-path trained/ai85-camvid-unet-large-q.pth.tar --output-checkpoint-path trained/ai85-camvid-unet-large-fakept-q.pth.tar --layer-name pt --layer-depth 56 --layer-name-after-pt upconv3 "$@"
1718
python ai8xize.py --test-dir $TARGET --prefix camvid_unet --checkpoint-file trained/ai85-camvid-unet-large-fakept-q.pth.tar --config-file networks/camvid-unet-large-fakept.yaml $COMMON_ARGS --overlap-data --mlator --no-unload --max-checklines 8192 "$@"
19+
python izer/add_fake_passthrough.py --input-checkpoint-path trained/ai85-aisegment-unet-large-q.pth.tar --output-checkpoint-path trained/ai85-aisegment-unet-large-fakept-q.pth.tar --layer-name pt --layer-depth 56 --layer-name-after-pt upconv3 "$@"
1820
python ai8xize.py --test-dir $TARGET --prefix aisegment_unet --checkpoint-file trained/ai85-aisegment-unet-large-fakept-q.pth.tar --config-file networks/aisegment-unet-large-fakept.yaml $COMMON_ARGS --overlap-data --mlator --no-unload --max-checklines 8192 "$@"
1921
python ai8xize.py --test-dir $TARGET --prefix svhn_tinierssd --checkpoint-file trained/ai85-svhn-tinierssd-qat8-q.pth.tar --config-file networks/svhn-tinierssd.yaml --overlap-data $COMMON_ARGS "$@"
2022
python ai8xize.py --test-dir $TARGET --prefix cifar-100-effnet2 --checkpoint-file trained/ai87-cifar100-effnet2-qat8-q.pth.tar --config-file networks/ai87-cifar100-effnet2.yaml --softmax $COMMON_ARGS "$@"
2123
python ai8xize.py --test-dir $TARGET --prefix cifar-100-mobilenet-v2-0.75 --checkpoint-file trained/ai87-cifar100-mobilenet-v2-0.75-qat8-q.pth.tar --config-file networks/ai87-cifar100-mobilenet-v2-0.75.yaml --softmax $COMMON_ARGS "$@"
22-
python ai8xize.py --test-dir $TARGET --prefix imagenet --checkpoint-file trained/ai87-imagenet-effnet2-q.pth.tar --config-file networks/ai87-imagenet-effnet2.yaml $COMMON_ARGS "$@"
24+
python ai8xize.py --test-dir $TARGET --prefix effnetv2_imagenet --softmax --checkpoint-file trained/ai87-imagenet-effnet2-q.pth.tar --config-file networks/ai87-imagenet-effnet2.yaml $COMMON_ARGS "$@"
2325
python ai8xize.py --test-dir $TARGET --prefix facedet_tinierssd --checkpoint-file trained/ai87-facedet-tinierssd-qat8-q.pth.tar --config-file networks/ai87-facedet-tinierssd.yaml --sample-input tests/sample_vggface2_facedetection.npy $COMMON_ARGS "$@"
24-
python ai8xize.py --test-dir $TARGET --prefix pascalvoc_fpndetector --checkpoint-file trained/ai87-pascalvoc-fpndetector-qat8-q.pth.tar --config-file networks/ai87-pascalvoc-fpndetector.yaml --fifo --sample-input tests/sample_pascalvoc_256_320.npy --overwrite --no-unload $COMMON_ARGS "$@"
26+
python ai8xize.py --test-dir $TARGET --prefix pascalvoc_fpndetector --checkpoint-file trained/ai87-pascalvoc-fpndetector-qat8-q.pth.tar --config-file networks/ai87-pascalvoc-fpndetector.yaml --fifo --sample-input tests/sample_pascalvoc_256_320.npy --no-unload $COMMON_ARGS "$@"
2527
python ai8xize.py --test-dir $TARGET --prefix kinetics --checkpoint-file trained/ai85-kinetics-qat8-q.pth.tar --config-file networks/ai85-kinetics-actiontcn.yaml --overlap-data --softmax --zero-sram $COMMON_ARGS "$@"

izer/backend/max7800x.py

+19-4
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
###################################################################################################
2-
# Copyright (C) 2019-2023 Maxim Integrated Products, Inc. All Rights Reserved.
2+
# Copyright (C) 2019-2024 Maxim Integrated Products, Inc. All Rights Reserved.
33
#
44
# Maxim Integrated Products, Inc. Default Copyright Notice:
55
# https://www.maximintegrated.com/en/aboutus/legal/copyrights.html
@@ -69,6 +69,7 @@ def create_net(self) -> str: # pylint: disable=too-many-locals,too-many-branche
6969
fast_fifo_quad = state.fast_fifo_quad
7070
fifo = state.fifo
7171
final_layer = state.final_layer
72+
final_scale = state.final_scale
7273
first_layer_used = state.first_layer_used
7374
flatten = state.flatten
7475
forever = state.forever
@@ -136,6 +137,7 @@ def create_net(self) -> str: # pylint: disable=too-many-locals,too-many-branche
136137
riscv = state.riscv
137138
riscv_cache = state.riscv_cache
138139
riscv_flash = state.riscv_flash
140+
scale_output = state.scale_output
139141
simple1b = state.simple1b
140142
simulated_sequence = state.simulated_sequence
141143
snoop = state.snoop
@@ -1152,7 +1154,8 @@ def create_net(self) -> str: # pylint: disable=too-many-locals,too-many-branche
11521154
conv_str = ', no convolution, '
11531155
apb.output(conv_str +
11541156
f'{output_chan[ll]}x{output_dim_str[ll]} output\n', embedded_code)
1155-
1157+
apb.output('\n', embedded_code)
1158+
apb.output(f'// Final Scales: {final_scale}\n', embedded_code)
11561159
apb.output('\n', embedded_code)
11571160

11581161
apb.header()
@@ -3553,8 +3556,20 @@ def run_eltwise(
35533556
elif block_mode:
35543557
assets.copy('assets', 'blocklevel-ai' + str(device), base_directory, test_name)
35553558
elif embedded_code:
3556-
output_count = output_chan[terminating_layer] \
3557-
* output_dim[terminating_layer][0] * output_dim[terminating_layer][1]
3559+
output_count = 0
3560+
for i in range(terminating_layer + 1):
3561+
if output_layer[i]:
3562+
if output_width[i] != 32:
3563+
if scale_output:
3564+
output_count += (output_chan[i] * output_dim[i][0] * output_dim[i][1]
3565+
+ (32 // (2 * output_width[i]) - 1)) \
3566+
// (32 // (2 * output_width[i]))
3567+
else:
3568+
output_count += (output_chan[i] * output_dim[i][0] * output_dim[i][1]
3569+
+ (32 // output_width[i] - 1)) \
3570+
// (32 // output_width[i])
3571+
else:
3572+
output_count += output_chan[i] * output_dim[i][0] * output_dim[i][1]
35583573
insert = summary_stats + \
35593574
'\n/* Number of outputs for this network */\n' \
35603575
f'#define CNN_NUM_OUTPUTS {output_count}'

izer/checkpoint.py

+9-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
###################################################################################################
2-
# Copyright (C) 2019-2023 Maxim Integrated Products, Inc. All Rights Reserved.
2+
# Copyright (C) 2019-2024 Maxim Integrated Products, Inc. All Rights Reserved.
33
#
44
# Maxim Integrated Products, Inc. Default Copyright Notice:
55
# https://www.maximintegrated.com/en/aboutus/legal/copyrights.html
@@ -56,6 +56,7 @@ def load(
5656
bias_min = []
5757
bias_max = []
5858
bias_size = []
59+
final_scale = {}
5960

6061
checkpoint = torch.load(checkpoint_file, map_location='cpu')
6162
print(f'Reading {checkpoint_file} to configure network weights...')
@@ -251,6 +252,12 @@ def load(
251252
# Add implicit shift based on quantization
252253
output_shift[seq] += 8 - abs(quantization[seq])
253254

255+
final_scale_name = '.'.join([layer, 'final_scale'])
256+
if final_scale_name in checkpoint_state:
257+
w = checkpoint_state[final_scale_name].numpy().astype(np.int64)
258+
final_scale[seq] = w.item()
259+
else:
260+
final_scale[seq] = 0
254261
layers += 1
255262
seq += 1
256263

@@ -286,4 +293,4 @@ def load(
286293
sys.exit(1)
287294

288295
return layers, weights, bias, output_shift, \
289-
input_channels, output_channels
296+
input_channels, output_channels, final_scale

izer/commandline.py

+3
Original file line numberDiff line numberDiff line change
@@ -464,6 +464,8 @@ def get_parser() -> argparse.Namespace:
464464
help='GitHub repository name for update checking')
465465
group.add_argument('--yamllint', metavar='S', default='yamllint',
466466
help='name of linter for YAML files (default: yamllint)')
467+
group.add_argument('--no-scale-output', action='store_true', default=False,
468+
help="scale output with final layer scale factor (default: false)")
467469

468470
args = parser.parse_args()
469471

@@ -691,6 +693,7 @@ def set_state(args: argparse.Namespace) -> None:
691693
state.rtl_preload_weights = args.rtl_preload_weights
692694
state.runtest_filename = args.runtest_filename
693695
state.sample_filename = args.sample_filename
696+
state.scale_output = not args.no_scale_output
694697
state.simple1b = args.simple1b
695698
state.sleep = args.deepsleep
696699
state.slow_load = args.slow_load

0 commit comments

Comments
 (0)