You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: c_cxx/QNN_EP/mobilenetv2_classification/README.md
+28-12Lines changed: 28 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -1,17 +1,29 @@
1
1
## About
2
-
- Builds the sample compiled against the ONNX Runtime built with support for Qualcomm AI Engine Direct SDK (Qualcomm Neural Network (QNN) SDK)
3
-
- The sample uses the QNN EP, run with Qnn CPU banckend and HTP backend
2
+
- Builds the sample compiled against the ONNX Runtime built with support for Qualcomm AI Engine Direct SDK (Qualcomm Neural Network (QNN) SDK).
3
+
- The sample uses the QNN EP to:
4
+
- a. run the float32 model on Qnn CPU banckend.
5
+
- b. run the QDQ model on HTP backend with qnn_context_cache_enable=1, and generates the Onnx model which has QNN context binary embeded.
6
+
- c. run the QNN context binary model generated from ONNX Runtime (previous step) on HTP backend, to improve the model initialization time and reduce memory overhead.
7
+
- d. run the QNN context binary model generated from QNN tool chain on HTP backend, to support models generated from native QNN tool chain.
4
8
- The sample downloads the mobilenetv2 model from Onnx model zoo, and use mobilenetv2_helper.py to quantize the float32 model to QDQ model which is required for HTP backend
5
9
- The sample is targeted to run on QC ARM64 device.
6
10
- There are 2 ways to improve the session creation time by using of QNN context binary:
7
-
- Option 1: Use contexty binary generated by OnnxRuntime QNN EP. OnnxRuntime QNN EP use QNN API to generate the QNN context binary, and also dumps some metadata (model name, version, graph meta id, etc) to identify the model. You can just simply set qnn_context_cache_enable to 1 and run with QDQ model. The first run will generate the context binary (Default file name is model_file_name.onnx.bin if qnn_context_cache_path is not set.). The model run afterwards will load from the generated context binary file.
11
+
- Option 1: Use context binary generated by OnnxRuntime QNN EP. OnnxRuntime QNN EP use QNN API to generate the QNN context binary, and also dumps some metadata (model name, version, graph meta id, etc) to identify the model.
12
+
- a. Set qnn_context_cache_enable to 1 and run with QDQ model.
13
+
- b. The first run will generate the context binary model (Default file name is model_file_name.onnx_qnn_ctx.onnx if qnn_context_cache_path is not set.).
14
+
- c. Use the generated context binary model (mobilenetv2-12_quant_shape.onnx_qnn_ctx.onnx) for inference going forward. (No need the QDQ model, no need to set the qnn_context_cache_enable).
15
+
8
16
- Option 2: Use context binary generated by native QNN tool chain:
9
-
- The sample also demonstrates the feature to load from QNN generated context binary file libmobilenetv2-12.serialized.bin to better support customer application migration from native QNN to OnnxRuntime QNN EP. Because QNN converted model use channel last layout, and the quantized model use INT8/UINT8 as model input & output. A script add_trans_cast.py is provided to update the orignial Onnx model by insert Cast and Transpose node to make the model input & output align with QNN converted model. It requires the Onnx float32 model and pre-converted QNN model_net.json.
10
-
- The sample provides mobilenetv2-12_net.json and context binary file libmobilenetv2-12.serialized.bin as exmple (generated by QNN version 2.10). Please follow QNN document to gnereated QNN model_net.json and the context binary file.
17
+
- The sample also demonstrates the feature to create an Onnx model file from QNN generated context binary file libmobilenetv2-12.serialized.bin to better support customer application migration from native QNN to OnnxRuntime QNN EP. A script [gen_qnn_ctx_onnx_model.py](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/qnn/gen_qnn_ctx_onnx_model.py) is provided to generate an Onnx model from QNN generated context binary file. It requires the QNN generated context binary file libmobilenetv2-12.serialized.bin and pre-converted QNN mobilenetv2-12_net.json.
18
+
-a. Convert model to QNN model and generate the QNN context binary file. The sample provides mobilenetv2-12_net.json and context binary file libmobilenetv2-12.serialized.bin as exmple (generated by QNN version 2.10). Please follow QNN document to generated QNN model_net.json and the context binary file.
- b. Create an Onnx model file from QNN generated context binary file libmobilenetv2-12.serialized.bin. More details refer to run_qnn_ep_sample.bat & [gen_qnn_ctx_onnx_model.py](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/qnn/gen_qnn_ctx_onnx_model.py)
- c. Create ONNX Runtime session with the model generated from step b.
26
+
- d. Run the model with quantized input data. The output also need to be dequantized. This is because QNN quantized model use quantized data type for model inputs & outputs. More details refer to QuantizedData & DequantizedData in [main.cpp](https://github.com/microsoft/onnxruntime-inference-examples/blob/main/c_cxx/QNN_EP/mobilenetv2_classification/main.cpp). Also the input image is NHWC layout for QNN converted model.
15
27
16
28
- More info on QNN EP - https://onnxruntime.ai/docs/execution-providers/QNN-ExecutionProvider.html
17
29
@@ -38,22 +50,26 @@ Example (Src): run_qnn_ep_sample.bat C:\src\onnxruntime C:\src\onnxruntime\build
38
50
## Example run result
39
51
```
40
52
...
41
-
REM run with QNN CPU backend
42
-
qnn_ep_sample.exe --cpu kitten_input.raw
53
+
REM run mobilenetv2-12_shape.onnx with QNN CPU backend
CheckStatus(g_ort, g_ort->CreateEnv(ORT_LOGGING_LEVEL_WARNING, "test", &env)); // Can set to ORT_LOGGING_LEVEL_INFO or ORT_LOGGING_LEVEL_VERBOSE for more info
72
+
// Can set to ORT_LOGGING_LEVEL_INFO or ORT_LOGGING_LEVEL_VERBOSE for more info
0 commit comments