You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**This is an experimental feature, the quantization API is subject to change.**
15
+
14
16
This tutorial demonstrates how to use `OpenVINOQuantizer` from `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf/tree/develop>`_ in PyTorch 2 Export Quantization flow to generate a quantized model customized for the `OpenVINO torch.compile backend <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ and explains how to lower the quantized model into the `OpenVINO <https://docs.openvino.ai/2024/index.html>`_ representation.
17
+
`OpenVINOQuantizer` unlocks the full potential of low-precision OpenVINO kernels due to the placement of quantizers designed specifically for the OpenVINO.
15
18
16
-
The pytorch 2 export quantization flow uses the torch.export to capture the model into a graph and performs quantization transformations on top of the ATen graph.
19
+
The PyTorch 2 export quantization flow uses the torch.export to capture the model into a graph and performs quantization transformations on top of the ATen graph.
17
20
This approach is expected to have significantly higher model coverage, better programmability, and a simplified UX.
18
21
OpenVINO backend compiles the FX Graph generated by TorchDynamo into an optimized OpenVINO model.
19
22
20
23
The quantization flow mainly includes four steps:
21
24
22
-
- Step 1: Install OpenVINO and NNCF.
23
-
- Step 2: Capture the FX Graph from the eager Model based on the `torch export mechanism <https://pytorch.org/docs/main/export.html>`_.
24
-
- Step 3: Apply the PyTorch 2 Export Quantization flow with OpenVINOQuantizer based on the captured FX Graph.
25
-
- Step 4: Lower the quantized model into OpenVINO representation with the API `torch.compile<https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_.
25
+
- Step 1: Capture the FX Graph from the eager Model based on the `torch export mechanism <https://pytorch.org/docs/main/export.html>`_.
26
+
- Step 2: Apply the PyTorch 2 Export Quantization flow with OpenVINOQuantizer based on the captured FX Graph.
27
+
- Step 3: Lower the quantized model into OpenVINO representation with the API `torch.compile <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_.
28
+
- Optional step 4: : Improve quantized model metrics via `quantize_pt2<https://openvinotoolkit.github.io/nncf/autoapi/nncf/experimental/torch/fx/index.html#nncf.experimental.torch.fx.quantize_pt2e>`_ method.
26
29
27
30
The high-level architecture of this flow could look like this:
28
31
@@ -61,7 +64,7 @@ Post Training Quantization
61
64
Now, we will walk you through a step-by-step tutorial for how to use it with `torchvision resnet18 model <https://download.pytorch.org/models/resnet18-f37072fd.pth>`_
62
65
for post training quantization.
63
66
64
-
1. OpenVINO and NNCF installation
67
+
Prerequisite: OpenVINO and NNCF installation
65
68
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
66
69
OpenVINO and NNCF could be easily installed via `pip distribution <https://docs.openvino.ai/2024/get-started/install-openvino.html>`_:
67
70
@@ -71,7 +74,7 @@ OpenVINO and NNCF could be easily installed via `pip distribution <https://docs.
71
74
pip install openvino, nncf
72
75
73
76
74
-
2. Capture FX Graph
77
+
1. Capture FX Graph
75
78
^^^^^^^^^^^^^^^^^^^^^
76
79
77
80
We will start by performing the necessary imports, capturing the FX Graph from the eager module.
@@ -105,7 +108,7 @@ We will start by performing the necessary imports, capturing the FX Graph from t
105
108
106
109
107
110
108
-
3. Apply Quantization
111
+
2. Apply Quantization
109
112
^^^^^^^^^^^^^^^^^^^^^^^
110
113
111
114
After we capture the FX Module to be quantized, we will import the OpenVINOQuantizer.
@@ -191,7 +194,7 @@ Finally, we will convert the calibrated Model to a quantized Model. ``convert_pt
191
194
After these steps, we finished running the quantization flow, and we will get the quantized model.
192
195
193
196
194
-
4. Lower into OpenVINO representation
197
+
3. Lower into OpenVINO representation
195
198
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
196
199
197
200
After that the FX Graph can utilize OpenVINO optimizations using `torch.compile(…, backend=”openvino”) <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ functionality.
@@ -209,7 +212,7 @@ After that the FX Graph can utilize OpenVINO optimizations using `torch.compile(
209
212
The optimized model is using low-level kernels designed specifically for Intel CPU.
210
213
This should significantly speed up inference time in comparison with the eager model.
211
214
212
-
5. Optional: Improve quantized model metrics
215
+
4. Optional: Improve quantized model metrics
213
216
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
214
217
215
218
NNCF implements advanced quantization algorithms like SmoothQuant and BiasCorrection, which help
0 commit comments