Skip to content

Commit 05a8137

Browse files
authored
Add some piece for offline prepare support. (#310)
* Add some piece for offline prepare support. Add some piece for non-embed mode
1 parent 2511881 commit 05a8137

File tree

3 files changed

+29
-5
lines changed

3 files changed

+29
-5
lines changed

c_cxx/QNN_EP/mobilenetv2_classification/README.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
- Builds the sample compiled against the ONNX Runtime built with support for Qualcomm AI Engine Direct SDK (Qualcomm Neural Network (QNN) SDK).
33
- The sample uses the QNN EP to:
44
- a. run the float32 model on Qnn CPU banckend.
5-
- b. run the QDQ model on HTP backend with qnn_context_cache_enable=1, and generates the Onnx model which has QNN context binary embeded.
5+
- b. run the QDQ model on HTP backend with qnn_context_cache_enable=1, and generates the Onnx model which has QNN context binary embedded.
66
- c. run the QNN context binary model generated from ONNX Runtime (previous step) on HTP backend, to improve the model initialization time and reduce memory overhead.
77
- d. run the QNN context binary model generated from QNN tool chain on HTP backend, to support models generated from native QNN tool chain.
88
- The sample downloads the mobilenetv2 model from Onnx model zoo, and use mobilenetv2_helper.py to quantize the float32 model to QDQ model which is required for HTP backend
@@ -12,6 +12,15 @@
1212
- a. Set qnn_context_cache_enable to 1 and run with QDQ model.
1313
- b. The first run will generate the context binary model (Default file name is model_file_name.onnx_qnn_ctx.onnx if qnn_context_cache_path is not set.).
1414
- c. Use the generated context binary model (mobilenetv2-12_quant_shape.onnx_qnn_ctx.onnx) for inference going forward. (No need the QDQ model, no need to set the qnn_context_cache_enable).
15+
- Notes: The QNN context binary is embedded within the ONNX model by default. Alternatively, set the QNN EP session option qnn_context_embed_mode to 0 in order to generate the QNN context binary as a separate file and embed the file's relative path in the ONNX model. This is necessary if the QNN context binary size exceeds protobuf's 2GB limit.
16+
- Offline prepare is also supported. Generate the QNN context binary on x64 machine and run it on QC ARM64 device.
17+
```
18+
# Build the qnn_ep_sample with x64.
19+
MSBuild.exe .\qnn_ep_sample.sln /property:Configuration=Release /p:Platform="x64"
20+
# Run qnn_ep_sample.exe on x64. It only create the Onnx Runtime session with QDQ model to generate the QNN context binary
21+
# No need to run the model
22+
qnn_ep_sample.exe --htp mobilenetv2-12_quant_shape.onnx kitten_input.raw --gen_ctx
23+
```
1524

1625
- Option 2: Use context binary generated by native QNN tool chain:
1726
- The sample also demonstrates the feature to create an Onnx model file from QNN generated context binary file libmobilenetv2-12.serialized.bin to better support customer application migration from native QNN to OnnxRuntime QNN EP. A script [gen_qnn_ctx_onnx_model.py](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/qnn/gen_qnn_ctx_onnx_model.py) is provided to generate an Onnx model from QNN generated context binary file. It requires the QNN generated context binary file libmobilenetv2-12.serialized.bin and pre-converted QNN mobilenetv2-12_net.json.
@@ -24,6 +33,7 @@
2433
- python gen_qnn_ctx_onnx_model.py -b libmobilenetv2-12.serialized.bin -q mobilenetv2-12_net.json
2534
- c. Create ONNX Runtime session with the model generated from step b.
2635
- d. Run the model with quantized input data. The output also need to be dequantized. This is because QNN quantized model use quantized data type for model inputs & outputs. More details refer to QuantizedData & DequantizedData in [main.cpp](https://github.com/microsoft/onnxruntime-inference-examples/blob/main/c_cxx/QNN_EP/mobilenetv2_classification/main.cpp). Also the input image is NHWC layout for QNN converted model.
36+
- Notes: Call gen_qnn_ctx_onnx_model.py with --disable_embed_mode to generate the ONNX model with the relative path to the QNN context binary file. This is necessary if the QNN context binary size exceeds protobuf's 2GB limit.
2737

2838
- More info on QNN EP - https://onnxruntime.ai/docs/execution-providers/QNN-ExecutionProvider.html
2939

@@ -56,12 +66,18 @@ qnn_ep_sample.exe --cpu mobilenetv2-12_shape.onnx kitten_input.raw
5666
Result:
5767
position=281, classification=n02123045 tabby, tabby cat, probability=13.663178
5868
59-
REM run mobilenetv2-12_quant_shape.onnx with QNN HTP backend, generate mobilenetv2-12_quant_shape.onnx_qnn_ctx.onnx
60-
qnn_ep_sample.exe --htp mobilenetv2-12_quant_shape.onnx kitten_input.raw --gen_ctx
69+
REM run mobilenetv2-12_quant_shape.onnx with QNN HTP backend
70+
qnn_ep_sample.exe --htp mobilenetv2-12_quant_shape.onnx kitten_input.raw
6171
6272
Result:
6373
position=281, classification=n02123045 tabby, tabby cat, probability=13.637316
6474
75+
REM load mobilenetv2-12_quant_shape.onnx with QNN HTP backend, generate mobilenetv2-12_quant_shape.onnx_qnn_ctx.onnx which hs QNN context binary embedded
76+
REM This does not has to be run on real device with HTP, it can be done on x64 platform also, since it supports offline generation
77+
qnn_ep_sample.exe --htp mobilenetv2-12_quant_shape.onnx kitten_input.raw --gen_ctx
78+
79+
Onnx model with QNN context binary is generated.
80+
6581
REM run mobilenetv2-12_quant_shape.onnx_qnn_ctx.onnx with QNN HTP backend
6682
qnn_ep_sample.exe --htp mobilenetv2-12_quant_shape.onnx_qnn_ctx.onnx kitten_input.raw
6783

c_cxx/QNN_EP/mobilenetv2_classification/main.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,10 @@ void run_ort_qnn_ep(const std::string& backend, const std::string& model_path, c
9696
options_values.data(), options_keys.size()));
9797
OrtSession* session;
9898
CheckStatus(g_ort, g_ort->CreateSession(env, model_path_wstr.c_str(), session_options, &session));
99+
if (generate_ctx) {
100+
printf("\nOnnx model with QNN context binary is generated.\n");
101+
return;
102+
}
99103

100104
OrtAllocator* allocator;
101105
CheckStatus(g_ort, g_ort->GetAllocatorWithDefaultOptions(&allocator));

c_cxx/QNN_EP/mobilenetv2_classification/run_qnn_ep_sample.bat

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ IF NOT EXIST %QNN_CTX_ONNX_GEN_SCRIPT% (
5757
powershell -Command "Invoke-WebRequest %QNN_CTX_ONNX_GEN_SCRIPT_URL% -Outfile %QNN_CTX_ONNX_GEN_SCRIPT%" )
5858

5959
REM based on the input & output information got from QNN converted mobilenetv2-12_net.json file
60-
REM Generate mobilenetv2-12_net_qnn_ctx.onnx with content of libmobilenetv2-12.serialized.bin embeded
60+
REM Generate mobilenetv2-12_net_qnn_ctx.onnx with content of libmobilenetv2-12.serialized.bin embedded
6161
python gen_qnn_ctx_onnx_model.py -b libmobilenetv2-12.serialized.bin -q mobilenetv2-12_net.json
6262

6363
where /q cmake.exe
@@ -100,7 +100,11 @@ copy /y ..\..\synset.txt .
100100
REM run mobilenetv2-12_shape.onnx with QNN CPU backend
101101
qnn_ep_sample.exe --cpu mobilenetv2-12_shape.onnx kitten_input.raw
102102

103-
REM run mobilenetv2-12_quant_shape.onnx with QNN HTP backend, generate mobilenetv2-12_quant_shape.onnx_qnn_ctx.onnx
103+
REM run mobilenetv2-12_quant_shape.onnx with QNN HTP backend
104+
qnn_ep_sample.exe --htp mobilenetv2-12_quant_shape.onnx kitten_input.raw
105+
106+
REM load mobilenetv2-12_quant_shape.onnx with QNN HTP backend, generate mobilenetv2-12_quant_shape.onnx_qnn_ctx.onnx which hs QNN context binary embedded
107+
REM This does not has to be run on real device with HTP, it can be done on x64 platform also, since it supports offline generation
104108
qnn_ep_sample.exe --htp mobilenetv2-12_quant_shape.onnx kitten_input.raw --gen_ctx
105109

106110
REM run mobilenetv2-12_quant_shape.onnx_qnn_ctx.onnx with QNN HTP backend

0 commit comments

Comments
 (0)