Skip to content

RFC: Specialize for non-mixed-dtype in elementwise_util #9388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: gh/swolchok/382/head
Choose a base branch
from

Conversation

swolchok
Copy link
Contributor

@swolchok swolchok commented Mar 19, 2025

Mixed dtype should be uncommon. Here is how we can specialize for the common case. Prepares us to tackle #9241 .

Test Plan: automated tests on this PR verify we didn't break the now-deprecated runtime_out_dtypes mode; tests on the next PR will verify that everything works after migration. Also included migration for exactly one operator, op_mul, to verify that the new code compiles.

To check performance, I edited examples/models/toy_model/model.py so that MulModule used inputs of size 3000, 2000 instead of 3, 2. I exported it with python3 -m examples.portable.scripts.export --model_name mul and saved the resulting mul.pte. Then I built in release mode with optimized kernels on, but with mul.out removed from kernels/optimized/optimized.yaml, so that we would use the optimized_portable_kernels build of kernels/portable/op_mul.cpp. Finally, I ran 3 trials on my M2 Macbook Pro using cmake-out/executor_runner --model_path mul3kby2k.pte --num_executions 1000 --cpu_threads 2. Resulting times for 1000 iterations in ms:

Previous diff: 8295, 8187, 8139
This diff: 2953, 2806, 2861

(For comparison, the actual optimized mul kernel took around 1000 ms to run 1000 iterations, and #9432 later in the stack arrived at similar numbers.)

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Mar 19, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9388

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 85451ea with merge base 1572381 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

swolchok added a commit that referenced this pull request Mar 19, 2025
Mixed dtype should be uncommon. Here is how we can specialize for the
common case.

Test Plan: automated tests on this PR verify we didn't break the
now-deprecated runtime_out_dtypes mode; tests on the next PR will
verify that everything works after migration. Also included migration
for exactly one operator, op_mul, to verify that the new code
compiles.

ghstack-source-id: 079c8b97c745f0e4004303ead6ca21de596020cc
ghstack-comment-id: 2735017566
Pull Request resolved: #9388
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 19, 2025
@swolchok swolchok requested a review from kimishpatel March 19, 2025 00:32
[ghstack-poisoned]
swolchok added a commit that referenced this pull request Mar 19, 2025
Mixed dtype should be uncommon. Here is how we can specialize for the
common case.

Test Plan: automated tests on this PR verify we didn't break the
now-deprecated runtime_out_dtypes mode; tests on the next PR will
verify that everything works after migration. Also included migration
for exactly one operator, op_mul, to verify that the new code
compiles.

ghstack-source-id: ae7c28973153341837c479d8b5ae7f11998c6c76
ghstack-comment-id: 2735017566
Pull Request resolved: #9388
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@swolchok swolchok changed the base branch from gh/swolchok/382/head to gh/swolchok/395/head March 25, 2025 19:48
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@swolchok
Copy link
Contributor Author

This is not ready for full review, but I'm leaving it as non-draft because I'd still like directional comments (hence RFC). I have some more PRs I will be adding to the stack once everything builds again.

[ghstack-poisoned]
[ghstack-poisoned]
/// Return the one output type we are willing to emit specialized code
/// to handle, given a compute type of CTYPE_COMMON and supported
/// output types of out_dtypes.
template <typename CTYPE_COMMON>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be CTYPE_COMPUTE, right?

((inputs.first->scalar_type() == compute_type) && ...);

constexpr ScalarType out_specialized_scalar_type =
specialized_output_scalar_type<CTYPE_COMPUTE>(out_dtypes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, CTYPE_COMPUTE

[ghstack-poisoned]
[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants