PyTorch frontend for fpgaConvNet, providing emulated accuracy results for features such as quantization and sparsity.
- models/ general interfaces for model creation, inference and onnx export.
- quantization/ emulation for fixed point, and block floating point representations.
- sparsity/ post-activation sparsity, and also tunable threshold relu.
- optimiser_interface/ python interface to launch fpgaconvnet optimiser and collect prediction results.
python quantization_example.py
python activation_sparsity_example.py
python threshold_relu_example.py
python encoding_example.py
imagenet:resnet18,resnet50,mobilenet_v2,repvgg_a0coco:yolov8ncamvid:unetcityscapes:unetllgmri:unetucf101:x3d_s,x3d_mbrats2020:unet3d
| Model | Source | Float32 | Fixed16 | Fixed8 | BFP8 (Layer) | BFP8 (Channel) |
|---|---|---|---|---|---|---|
| resnet18 | torchvision | 69.76 | 69.76 | 1.03 | 68.48 | 69.26 |
| resnet50 | torchvision | 76.13 | 76.10 | 0.36 | 74.38 | 75.75 |
| mobilenet_v2 | torchvision | 71.87 | 71.76 | 0.10 | 53.68 | 69.51 |
| repvgg_a0 | timm | 72.41 | 72.40 | 0.21 | 0.21 | 66.08 |
| Model | Source | Float32 | Fixed16 | Fixed8 | BFP8 (Layer) | BFP8 (Channel) |
|---|---|---|---|---|---|---|
| yolov8n | ultralytics | 37.1 | 37.1 | 0.0 | 29.6 | 35.1 |
| yolov8s | ultralytics | 39.2 | 39.1 | 0.0 | 38.7 | 36.8 |
| Model | Source | Float32 | Fixed16 | Fixed8 | BFP8 (Layer) | BFP8 (Channel) |
|---|---|---|---|---|---|---|
| unet | nncf | 71.95 | 71.95 | 61.02 | 71.60 | 71.85 |
| unet-bilinear | nncf | 71.67 | 71.67 | 60.62 | 71.40 | 71.75 |
| Model | Source | Float32 | Fixed16 | Fixed8 | BFP8 (Layer) | BFP8 (Channel) |
|---|---|---|---|---|---|---|
| unet | mmsegmentation | 69.10 | 69.10 | 1.98 | 61.74 | 68.43 |
| Model | Source | Float32 | Fixed16 | Fixed8 | BFP8 (Layer) | BFP8 (Channel) |
|---|---|---|---|---|---|---|
| unet | brain-segmentation-pytorch | 90.89 | 90.88 | 80.98 | 90.95 | 90.85 |
| unet-bilinear | brain-segmentation-pytorch | 91.05 | 91.05 | 77.51 | 91.04 | 91.03 |
| Model | Source | Float32 | Fixed16 | Fixed8 | BFP8 (Layer) | BFP8 (Channel) |
|---|---|---|---|---|---|---|
| x3d_s | mmaction2 | 93.68 | 93.57 | 1.13 | 90.21 | 93.57 |
| x3d_m | mmaction2 | 96.40 | 96.40 | 0.81 | 95.24 | 96.29 |
| Model | Source | Float32 | Fixed16 | Fixed8 | BFP8 (Layer) | BFP8 (Channel) |
|---|---|---|---|---|---|---|
| unet3d | BraTS20_3dUnet_3dAutoEncoder | 85.34 | 85.23 | 1.15 | 85.14 | 85.34 |
- Q - Fixed16 Quantization
- AS - Activation Sparsity
- WS - Weight Sparsity (applying global pruning threshold)
- Post-training, without fine-tuning
| Model | Experiment | Accuracy | Sparsity |
|---|---|---|---|
| resnet18 | Q+AS | 69.74 | 50.75 |
| resnet18 | Q+AS+WS(0.005) | 69.42 | 56.33 |
| resnet18 | Q+AS+WS(0.010) | 67.36 | 61.47 |
| resnet18 | Q+AS+WS(0.015) | 58.38 | 65.91 |
| resnet18 | Q+AS+WS(0.020) | 27.91 | 69.63 |
- BFP8 (Channel) Quantization
- RLE-8, run-length encoding, use 8 bits for encoding (max length 2^8)
- Compression Ratio, average over all weights and activations
| Dataset | Model | Experiment | Avg Compression Ratio |
|---|---|---|---|
| coco | yolov8n (onnx) | RLE-8 | 1.753 |
| camvid | unet-bilinear (onnx) | RLE-8 | 1.175 |
| cityscapes | unet (onnx) | RLE-8 | GPU TIMEOUT |
| ucf101 | x3d_s (onnx) | RLE-8 | 1.737 |
| ucf101 | x3d_m (onnx) | RLE-8 | 1.721 |
| brats2020 | unet3d (onnx) | RLE-8 | 0.821 |
| coco | yolov8n (onnx) | RLE-4 | 1.317 |
| camvid | unet-bilinear (onnx) | RLE-4 | 0.717 |
| coco | yolov8n (onnx) | RLE-2 | 1.112 |
| camvid | unet-bilinear (onnx) | RLE-2 | 0.838 |
| coco | yolov8n (onnx) | Huffman | 0.824 |
| coco | yolov8s (onnx) | Huffman | 0.805 |
| camvid | unet-bilinear (onnx) | Huffman | 0.684 |
| cityscapes | unet (onnx) | Huffman | 0.692 |
| ucf101 | x3d_s (onnx) | Huffman | 0.835 |
| ucf101 | x3d_m (onnx) | Huffman | 0.833 |
| brats2020 | unet3d (onnx) | Huffman | 0.718 |