Skip to content

Commit 82a47a5

Browse files
Comments
1 parent 810899a commit 82a47a5

File tree

1 file changed

+14
-11
lines changed

1 file changed

+14
-11
lines changed

prototype_source/openvino_quantizer.rst

+14-11
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
PyTorch 2 Export Quantization for OpenVINO torch.compile backend.
22
===========================================================================
33

4-
**Authors**: `Daniil Lyakhov <https://github.com/daniil-lyakhov>`_, `Alexander Suslov <https://github.com/alexsu52>`_, `Aamir Nazir <https://github.com/anzr299>`_
4+
**Authors**: `Daniil Lyakhov <https://github.com/daniil-lyakhov>`_, `Aamir Nazir <https://github.com/anzr299>`_, `Alexander Suslov <https://github.com/alexsu52>`_, `Yamini Nimmagadda <https://github.com/ynimmaga>`_, `Alexander Kozlov <https://github.com/AlexKoff88>`_
55

66
Prerequisites
77
--------------
@@ -11,18 +11,21 @@ Prerequisites
1111
Introduction
1212
--------------
1313

14+
**This is an experimental feature, the quantization API is subject to change.**
15+
1416
This tutorial demonstrates how to use `OpenVINOQuantizer` from `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf/tree/develop>`_ in PyTorch 2 Export Quantization flow to generate a quantized model customized for the `OpenVINO torch.compile backend <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ and explains how to lower the quantized model into the `OpenVINO <https://docs.openvino.ai/2024/index.html>`_ representation.
17+
`OpenVINOQuantizer` unlocks the full potential of low-precision OpenVINO kernels due to the placement of quantizers designed specifically for the OpenVINO.
1518

16-
The pytorch 2 export quantization flow uses the torch.export to capture the model into a graph and performs quantization transformations on top of the ATen graph.
19+
The PyTorch 2 export quantization flow uses the torch.export to capture the model into a graph and performs quantization transformations on top of the ATen graph.
1720
This approach is expected to have significantly higher model coverage, better programmability, and a simplified UX.
1821
OpenVINO backend compiles the FX Graph generated by TorchDynamo into an optimized OpenVINO model.
1922

2023
The quantization flow mainly includes four steps:
2124

22-
- Step 1: Install OpenVINO and NNCF.
23-
- Step 2: Capture the FX Graph from the eager Model based on the `torch export mechanism <https://pytorch.org/docs/main/export.html>`_.
24-
- Step 3: Apply the PyTorch 2 Export Quantization flow with OpenVINOQuantizer based on the captured FX Graph.
25-
- Step 4: Lower the quantized model into OpenVINO representation with the API `torch.compile <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_.
25+
- Step 1: Capture the FX Graph from the eager Model based on the `torch export mechanism <https://pytorch.org/docs/main/export.html>`_.
26+
- Step 2: Apply the PyTorch 2 Export Quantization flow with OpenVINOQuantizer based on the captured FX Graph.
27+
- Step 3: Lower the quantized model into OpenVINO representation with the API `torch.compile <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_.
28+
- Optional step 4: : Improve quantized model metrics via `quantize_pt2 <https://openvinotoolkit.github.io/nncf/autoapi/nncf/experimental/torch/fx/index.html#nncf.experimental.torch.fx.quantize_pt2e>`_ method.
2629

2730
The high-level architecture of this flow could look like this:
2831

@@ -61,7 +64,7 @@ Post Training Quantization
6164
Now, we will walk you through a step-by-step tutorial for how to use it with `torchvision resnet18 model <https://download.pytorch.org/models/resnet18-f37072fd.pth>`_
6265
for post training quantization.
6366

64-
1. OpenVINO and NNCF installation
67+
Prerequisite: OpenVINO and NNCF installation
6568
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6669
OpenVINO and NNCF could be easily installed via `pip distribution <https://docs.openvino.ai/2024/get-started/install-openvino.html>`_:
6770

@@ -71,7 +74,7 @@ OpenVINO and NNCF could be easily installed via `pip distribution <https://docs.
7174
pip install openvino, nncf
7275
7376
74-
2. Capture FX Graph
77+
1. Capture FX Graph
7578
^^^^^^^^^^^^^^^^^^^^^
7679

7780
We will start by performing the necessary imports, capturing the FX Graph from the eager module.
@@ -105,7 +108,7 @@ We will start by performing the necessary imports, capturing the FX Graph from t
105108
106109
107110
108-
3. Apply Quantization
111+
2. Apply Quantization
109112
^^^^^^^^^^^^^^^^^^^^^^^
110113

111114
After we capture the FX Module to be quantized, we will import the OpenVINOQuantizer.
@@ -191,7 +194,7 @@ Finally, we will convert the calibrated Model to a quantized Model. ``convert_pt
191194
After these steps, we finished running the quantization flow, and we will get the quantized model.
192195

193196

194-
4. Lower into OpenVINO representation
197+
3. Lower into OpenVINO representation
195198
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
196199

197200
After that the FX Graph can utilize OpenVINO optimizations using `torch.compile(…, backend=”openvino”) <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ functionality.
@@ -209,7 +212,7 @@ After that the FX Graph can utilize OpenVINO optimizations using `torch.compile(
209212
The optimized model is using low-level kernels designed specifically for Intel CPU.
210213
This should significantly speed up inference time in comparison with the eager model.
211214

212-
5. Optional: Improve quantized model metrics
215+
4. Optional: Improve quantized model metrics
213216
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
214217

215218
NNCF implements advanced quantization algorithms like SmoothQuant and BiasCorrection, which help

0 commit comments

Comments
 (0)