Instruct users to run llama for qnn to the active repro (#10231)

cccclai · facebook-github-bot · commit a1acef7204c1 · 2025-04-16T14:24:43.000-07:00
Summary: Many users are trying to export llama with this flow https://github.com/pytorch/executorch/tree/main/examples/models/llama and end up with non performant, or different isssues, like #10226. Instruct users to use the qcom version Reviewed By: kirklandsign Differential Revision: D73125467
diff --git a/examples/models/llama/export_llama_lib.py b/examples/models/llama/export_llama_lib.py
@@ -816,6 +816,10 @@ def _to_edge_and_lower_llama(  # noqa: C901
         modelname = f"coreml_{modelname}"
 
     if args.qnn:
+        logging.warning(
+            f"The model definition in current repro is not performant, please refer to the instruction"
+            " in https://github.com/pytorch/executorch/tree/main/examples/qualcomm/oss_scripts/llama/README.md for better performance."
+        )
         from executorch.extension.llm.custom_ops import model_sharding
 
         partitioners.append(