You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug 1930286 [wpt PR 49083] - webnn: Support block-wise quantization for DirectML backend, a=testonly
Automatic update from web-platform-tests
webnn: Support block-wise quantization for DirectML backend
Block-wise quantization divides input tensors into smaller blocks that
are independently quantized, resulting in faster optimization and high
precision quantization [1]. It is used for popular language models,
such as phi-3 mini int4 quantized model [2]. Related WG issue [3] has
been opened to discussion.
Firstly, this CL validates scale and zero point tensors for block-wise
quantization. Besides, this CL also implements the block-wise
quantization in DirectML backend by using DML_OPERATOR_QUANTIZE and
DML_OPERATOR_DEQUANTIZE which are available in FL >= 6.3.
More validation and conformance tests are added to verify the
implementation.
[1]: https://arxiv.org/abs/2110.02861
[2]: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
[3]: webmachinelearning/webnn#779
Bug: 40206287
Change-Id: I977b0be57deebd7afcae216edc3ddc3818b8c09f
Cq-Include-Trybots: luci.chromium.try:mac14.arm64-blink-rel, mac14-blink-rel, mac15.arm64-blink-rel, mac15-blink-rel, linux-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5964816
Reviewed-by: Rafael Cintron <rafael.cintronmicrosoft.com>
Reviewed-by: ningxin hu <ningxin.huintel.com>
Commit-Queue: ningxin hu <ningxin.huintel.com>
Cr-Commit-Position: refs/heads/main{#1380767}
--
wpt-commits: 8686b7a6d288d3b2c22b5ddb5a21773619b22b85
wpt-pr: 49083
UltraBlame original commit: 6b8a19bf1f5562bfae60549575af9c2b422b4975
0 commit comments