Comments

daniil-lyakhov · daniil-lyakhov · commit 82a47a5672d3 · 2025-02-24T15:44:06.000+01:00
diff --git a/prototype_source/openvino_quantizer.rst b/prototype_source/openvino_quantizer.rst
@@ -1,7 +1,7 @@
 PyTorch 2 Export Quantization for OpenVINO torch.compile backend.
 ===========================================================================
 
-**Authors**: `Daniil Lyakhov <https://github.com/daniil-lyakhov>`_, `Alexander Suslov <https://github.com/alexsu52>`_, `Aamir Nazir <https://github.com/anzr299>`_
+**Authors**: `Daniil Lyakhov <https://github.com/daniil-lyakhov>`_,  `Aamir Nazir <https://github.com/anzr299>`_,  `Alexander Suslov <https://github.com/alexsu52>`_, `Yamini Nimmagadda <https://github.com/ynimmaga>`_, `Alexander Kozlov <https://github.com/AlexKoff88>`_
 
 Prerequisites
 --------------
@@ -11,18 +11,21 @@ Prerequisites
 Introduction
 --------------
 
+**This is an experimental feature, the quantization API is subject to change.**
+
 This tutorial demonstrates how to use `OpenVINOQuantizer` from `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf/tree/develop>`_ in PyTorch 2 Export Quantization flow to generate a quantized model customized for the `OpenVINO torch.compile backend <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ and explains how to lower the quantized model into the `OpenVINO <https://docs.openvino.ai/2024/index.html>`_ representation.
+`OpenVINOQuantizer` unlocks the full potential of low-precision OpenVINO kernels due to the placement of quantizers designed specifically for the OpenVINO.
 
-The pytorch 2 export quantization flow uses the torch.export to capture the model into a graph and performs quantization transformations on top of the ATen graph.
+The PyTorch 2 export quantization flow uses the torch.export to capture the model into a graph and performs quantization transformations on top of the ATen graph.
 This approach is expected to have significantly higher model coverage, better programmability, and a simplified UX.
 OpenVINO backend compiles the FX Graph generated by TorchDynamo into an optimized OpenVINO model.
 
 The quantization flow mainly includes four steps:
 
-- Step 1: Install OpenVINO and NNCF.
-- Step 2: Capture the FX Graph from the eager Model based on the `torch export mechanism <https://pytorch.org/docs/main/export.html>`_.
-- Step 3: Apply the PyTorch 2 Export Quantization flow with OpenVINOQuantizer based on the captured FX Graph.
-- Step 4: Lower the quantized model into OpenVINO representation with the API `torch.compile <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_.
+- Step 1: Capture the FX Graph from the eager Model based on the `torch export mechanism <https://pytorch.org/docs/main/export.html>`_.
+- Step 2: Apply the PyTorch 2 Export Quantization flow with OpenVINOQuantizer based on the captured FX Graph.
+- Step 3: Lower the quantized model into OpenVINO representation with the API `torch.compile <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_.
+- Optional step 4: : Improve quantized model metrics via `quantize_pt2 <https://openvinotoolkit.github.io/nncf/autoapi/nncf/experimental/torch/fx/index.html#nncf.experimental.torch.fx.quantize_pt2e>`_ method.
 
 The high-level architecture of this flow could look like this:
 
@@ -61,7 +64,7 @@ Post Training Quantization
 Now, we will walk you through a step-by-step tutorial for how to use it with `torchvision resnet18 model <https://download.pytorch.org/models/resnet18-f37072fd.pth>`_
 for post training quantization.
 
-1. OpenVINO and NNCF installation
+Prerequisite: OpenVINO and NNCF installation
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 OpenVINO and NNCF could be easily installed via `pip distribution <https://docs.openvino.ai/2024/get-started/install-openvino.html>`_:
 
@@ -71,7 +74,7 @@ OpenVINO and NNCF could be easily installed via `pip distribution <https://docs.
     pip install openvino, nncf
 
 
-2. Capture FX Graph
+1. Capture FX Graph
 ^^^^^^^^^^^^^^^^^^^^^
 
 We will start by performing the necessary imports, capturing the FX Graph from the eager module.
@@ -105,7 +108,7 @@ We will start by performing the necessary imports, capturing the FX Graph from t
 
 
 
-3. Apply Quantization
+2. Apply Quantization
 ^^^^^^^^^^^^^^^^^^^^^^^
 
 After we capture the FX Module to be quantized, we will import the OpenVINOQuantizer.
@@ -191,7 +194,7 @@ Finally, we will convert the calibrated Model to a quantized Model. ``convert_pt
 After these steps, we finished running the quantization flow, and we will get the quantized model.
 
 
-4. Lower into OpenVINO representation
+3. Lower into OpenVINO representation
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 After that the FX Graph can utilize OpenVINO optimizations using `torch.compile(…, backend=”openvino”) <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ functionality.
@@ -209,7 +212,7 @@ After that the FX Graph can utilize OpenVINO optimizations using `torch.compile(
 The optimized model is using low-level kernels designed specifically for Intel CPU.
 This should significantly speed up inference time in comparison with the eager model.
 
-5. Optional: Improve quantized model metrics
+4. Optional: Improve quantized model metrics
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 NNCF implements advanced quantization algorithms like SmoothQuant and BiasCorrection, which help