[OpenVINO GPU] OpenVINO EP shouldn't override the "ACCURACY" precision to "FP32" #23895

mingmingtasd · 2025-03-05T03:14:14Z

Describe the issue

There is a regression caused by https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/openvino/openvino_provider_factory.cc#L109 from this PR: #23553. Now users can't set "ACCURACY" precision for OV GPU device, it will be overridden as "FP32" which is not as expected. "ACCURACY" precision is really useful, see details: https://docs.openvino.ai/2025/openvino-workflow/running-inference/optimize-inference/precision-control.html#execution-mode, it's not equal to "FP32".

To reproduce

You can set "precision" as "ACCURACY" but it will behavior as "FP32", like https://onnxruntime.ai/docs/execution-providers/OpenVINO-ExecutionProvider.html#cc-api-20:

std::unordered_map<std::string, std::string> options;
options["device_type"] = "GPU";
options["precision"] = "ACCURACY";
session_options.AppendExecutionProvider("OpenVINO", options);

Urgency

Very urgent, it impacts my performance testing

Platform

Windows

OS Version

win11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

30c6825

ONNX Runtime API

C++

Architecture

X64

Execution Provider

OpenVINO

Execution Provider Library Version

ov 2024.6

The text was updated successfully, but these errors were encountered:

mingmingtasd · 2025-03-06T02:14:03Z

By the way, is that possible to expose the ov::hint::execution_mode directly and let it override the precision?

huningxin · 2025-03-06T03:31:19Z

For WebNN use case, we need OpenVINO EP to select execution precision on GPU device by respecting the model data types. Say if an operator in the model operates on FP32 tensor, execute it in FP32 precision, if an operator operates on FP16 tensor, execute it in FP16 precision.

Previously , we set OpenVINO EP precision option to ACCURACY: https://onnxruntime.ai/docs/execution-providers/OpenVINO-ExecutionProvider.html#summary-of-options

Internally, regarding to ACCURACY option, BasicBackend::PopulateConfigValue() sets OpenVINO ov::hint::inference_precision to ov::element::undefined and set execution_mode to ov::hint::execution_mode::ExecutionMode::ACCURACY that works for us.

https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/openvino/backends/basic_backend.cc#L170

However, the recent commit: a6ea57b, changed changes this behavior and sets to highest precision on a device, for GPU, it is FP32:

https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/openvino/openvino_provider_factory.cc#L109

This causes WebNN FP16 model execution is slower than before.

github-actions bot added the ep:OpenVINO issues related to OpenVINO execution provider label Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenVINO GPU] OpenVINO EP shouldn't override the "ACCURACY" precision to "FP32" #23895

[OpenVINO GPU] OpenVINO EP shouldn't override the "ACCURACY" precision to "FP32" #23895

mingmingtasd commented Mar 5, 2025 •

edited

Loading

mingmingtasd commented Mar 6, 2025

huningxin commented Mar 6, 2025

[OpenVINO GPU] OpenVINO EP shouldn't override the "ACCURACY" precision to "FP32" #23895

[OpenVINO GPU] OpenVINO EP shouldn't override the "ACCURACY" precision to "FP32" #23895

Comments

mingmingtasd commented Mar 5, 2025 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

mingmingtasd commented Mar 6, 2025

huningxin commented Mar 6, 2025

mingmingtasd commented Mar 5, 2025 •

edited

Loading