-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental: Executorch export to CoreML and MPS #742
Conversation
Tried on stories15M and it produces nonsense, but this is the start of integration.
7257854
to
f73d717
Compare
Example export command: I'm getting 100-110 tokens/sec with MPS export. For comparison, I get 60-68 tokens/sec with MPS eager and over 400 tokens/sec with XNNPACK export. |
here is the FP32 exported MPS graph:
|
@@ -97,7 +98,30 @@ def export_model(model, device, output_path, args=None) -> str: # noqa: C901 | |||
dynamic_shapes=dynamic_shapes, | |||
edge_compile_config=edge_config, | |||
) | |||
edge_manager = edge_manager.to_backend(XnnpackDynamicallyQuantizedPartitioner()) | |||
if backend == "xnnpack": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is it possible to do something like: https://github.com/pytorch/executorch/blob/main/examples/models/llama2/export_llama_lib.py#L393-L407 so that it's easier to read
Is there a comment to generate |
with
|
graph breaks with the custom SDPA op removed are fixed if I manually patch the indexput portion of pytorch/executorch#3399 to executorch. however, still getting garbage MPS FP16 result and 55 tokens/sec with FP32 + MPS. |
f73d717
to
c14845d
Compare
torchchat/build/model.py and executorch/examples/model/llama_transformer.py are divergent. need to reconcile why they're different (and maybe debug CoreML backend with the current torchchat copy?) |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/742
Note: Links to docs will display an error until the docs builds have been completed. ❌ 25 New Failures, 1 Pending, 6 Unrelated FailuresAs of commit 2385491 with merge base 6455aa2 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Closing old PRs to increase attention of new PRs |
Known issues:
??
tokens otherwise) 2) get MPS FP32 to performCore ML produces nonsense -- stutters