You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Description
Qnn weight sharing improvement so that only the last session in the weight sharing group (the session that has both share_ep_contexts and stop_share_ep_contexts enabled) generates the .bin file. The .bin file name is decided from the 1st session. And all generated *_ctx.onnx models point to this single .bin to avoid post-processing work.
Previously each session generates a _ctx.onnx model with a .bin file. So it requires post-processing work to go through generated *_ctx.onnx models to get the last generated *_ctx.bin file and update all *_ctx.onnx to point to the same .bin file and remove the .bin files not used.
Copy file name to clipboardExpand all lines: onnxruntime/test/ep_weight_sharing_ctx_gen/command_args_parser.cc
+3-4
Original file line number
Diff line number
Diff line change
@@ -48,7 +48,6 @@ namespace qnnctxgen {
48
48
"\t [QNN only] [htp_arch]: The minimum HTP architecture. The driver will use ops compatible with this architecture. eg: '0', '68', '69', '73', '75'. Defaults to '0' (none). \n"
49
49
"\t [QNN only] [enable_htp_fp16_precision]: Enable the HTP_FP16 precision so that the float32 model will be inferenced with fp16 precision. \n"
50
50
"\t Otherwise, it will be fp32 precision. Works for float32 model for HTP backend. Defaults to '1' (with FP16 precision.). \n"
51
-
"\t [QNN only] [enable_htp_weight_sharing]: Allows common weights across graphs to be shared and stored in a single context binary. Defaults to '1' (enabled).\n"
52
51
"\t [QNN only] [offload_graph_io_quantization]: Offload graph input quantization and graph output dequantization to another EP (typically CPU EP). \n"
53
52
"\t Defaults to '1' (QNN EP handles the graph I/O quantization and dequantization). \n"
54
53
"\t [QNN only] [enable_htp_spill_fill_buffer]: Enable HTP spill file buffer, used while generating QNN context binary."
0 commit comments