Skip to content

Commit 77b67ef

Browse files
authored
[doc] update description for dynamic quantization (#33466)
### Details: - Updated description for dynamic quantization
1 parent 84edea2 commit 77b67ef

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

docs/articles_en/openvino-workflow-generative/inference-with-optimum-intel.rst

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -239,7 +239,14 @@ includes **Dynamic quantization** of activations of 4/8-bit quantized MatMuls an
239239
insignificant deviation in generation accuracy. Quantization is performed in a group-wise
240240
manner, with configurable group size. It means that values in a group share quantization
241241
parameters. Larger group sizes lead to faster inference but lower accuracy. Recommended
242-
group size values are ``0``, ``32``, ``64``, or ``128``.
242+
group size values are ``0``, ``32``, ``64``, ``128`` or ``-1``(per-token).
243+
244+
.. note::
245+
246+
The dynamic quantization group size is treated as a guideline rather than a strict requirement.
247+
The actual policy may vary depending on functional capabilities or performance and accuracy considerations.
248+
The plugin may choose to disable dynamic quantization entirely or use a smaller group size than the one
249+
specified by the user.
243250
244251
On Intel CPU and Intel GPU, dynamic quantization is enabled **by default**.
245252

0 commit comments

Comments
 (0)