Skip to content

Reapply #9841: Migrate elementwise_util callers to the variants with out_dtypes in template arguments #10491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
May 9, 2025

Conversation

swolchok
Copy link
Contributor

This is necessary to take advantage of #9388, which
creates dtype-specialized implementations for the non-mixed dtype case.

Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size:

Before:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:51 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1310720	81920	0	4295458816	4296851456	1001cc000	cmake-out/test/size_test_all_ops
4358144	98304	0	4296212480	4300668928	100570000	cmake-out/test/size_test_all_optimized_ops

After:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:57 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1376256	81920	0	4295491584	4296949760	1001e4000	cmake-out/test/size_test_all_ops
4423680	98304	0	4296327168	4300849152	10059c000	cmake-out/test/size_test_all_optimized_ops

However, on an absolute basis, size is still below where we are at two PRs ago, which was:

ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:24 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1474560	81920	0	4295655424	4297211904	100224000	cmake-out/test/size_test_all_ops
4489216	98304	0	4296359936	4300947456	1005b4000	cmake-out/test/size_test_all_optimized_ops

swolchok added 11 commits April 24, 2025 13:15
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@swolchok
Copy link
Contributor Author

swolchok commented Apr 25, 2025

Copy link

pytorch-bot bot commented Apr 25, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10491

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

swolchok added a commit that referenced this pull request Apr 25, 2025
…out_dtypes in template arguments

This is necessary to take advantage of #9388, which
creates dtype-specialized implementations for the non-mixed dtype case.

Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size:

```
Before:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:51 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1310720	81920	0	4295458816	4296851456	1001cc000	cmake-out/test/size_test_all_ops
4358144	98304	0	4296212480	4300668928	100570000	cmake-out/test/size_test_all_optimized_ops

After:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:57 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1376256	81920	0	4295491584	4296949760	1001e4000	cmake-out/test/size_test_all_ops
4423680	98304	0	4296327168	4300849152	10059c000	cmake-out/test/size_test_all_optimized_ops
```

However, on an absolute basis, size is still below where we are at two PRs ago, which was:

```
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:24 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1474560	81920	0	4295655424	4297211904	100224000	cmake-out/test/size_test_all_ops
4489216	98304	0	4296359936	4300947456	1005b4000	cmake-out/test/size_test_all_optimized_ops
```


ghstack-source-id: 9c6751f
ghstack-comment-id: 2831329157
Pull-Request-resolved: #10491
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 25, 2025
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
swolchok added a commit that referenced this pull request Apr 25, 2025
…out_dtypes in template arguments

This is necessary to take advantage of #9388, which
creates dtype-specialized implementations for the non-mixed dtype case.

Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size:

```
Before:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:51 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1310720	81920	0	4295458816	4296851456	1001cc000	cmake-out/test/size_test_all_ops
4358144	98304	0	4296212480	4300668928	100570000	cmake-out/test/size_test_all_optimized_ops

After:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:57 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1376256	81920	0	4295491584	4296949760	1001e4000	cmake-out/test/size_test_all_ops
4423680	98304	0	4296327168	4300849152	10059c000	cmake-out/test/size_test_all_optimized_ops
```

However, on an absolute basis, size is still below where we are at two PRs ago, which was:

```
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:24 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1474560	81920	0	4295655424	4297211904	100224000	cmake-out/test/size_test_all_ops
4489216	98304	0	4296359936	4300947456	1005b4000	cmake-out/test/size_test_all_optimized_ops
```


ghstack-source-id: 58e2b05
ghstack-comment-id: 2831329157
Pull-Request-resolved: #10491
@swolchok
Copy link
Contributor Author

internal diff number for size check on this stack is D73691545

swolchok added 3 commits May 7, 2025 19:18
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
swolchok added a commit that referenced this pull request May 8, 2025
…out_dtypes in template arguments

This is necessary to take advantage of #9388, which
creates dtype-specialized implementations for the non-mixed dtype case.

Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size:

```
Before:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:51 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1310720	81920	0	4295458816	4296851456	1001cc000	cmake-out/test/size_test_all_ops
4358144	98304	0	4296212480	4300668928	100570000	cmake-out/test/size_test_all_optimized_ops

After:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:57 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1376256	81920	0	4295491584	4296949760	1001e4000	cmake-out/test/size_test_all_ops
4423680	98304	0	4296327168	4300849152	10059c000	cmake-out/test/size_test_all_optimized_ops
```

However, on an absolute basis, size is still below where we are at two PRs ago, which was:

```
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:24 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1474560	81920	0	4295655424	4297211904	100224000	cmake-out/test/size_test_all_ops
4489216	98304	0	4296359936	4300947456	1005b4000	cmake-out/test/size_test_all_optimized_ops
```

ghstack-source-id: 1eb6943
ghstack-comment-id: 2831329157
Pull-Request-resolved: #10491
Base automatically changed from gh/swolchok/430/head to main May 9, 2025 17:35
@swolchok
Copy link
Contributor Author

swolchok commented May 9, 2025

CI is green, just rebasing due to spurious conflicts from stacked land

[ghstack-poisoned]
swolchok added a commit that referenced this pull request May 9, 2025
…out_dtypes in template arguments

This is necessary to take advantage of #9388, which
creates dtype-specialized implementations for the non-mixed dtype case.

Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size:

```
Before:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:51 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1310720	81920	0	4295458816	4296851456	1001cc000	cmake-out/test/size_test_all_ops
4358144	98304	0	4296212480	4300668928	100570000	cmake-out/test/size_test_all_optimized_ops

After:
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:57 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1376256	81920	0	4295491584	4296949760	1001e4000	cmake-out/test/size_test_all_ops
4423680	98304	0	4296327168	4300849152	10059c000	cmake-out/test/size_test_all_optimized_ops
```

However, on an absolute basis, size is still below where we are at two PRs ago, which was:

```
ExecuTorch with no ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  153928 Apr 25 12:24 cmake-out/test/size_test
ExecuTorch with portable ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops
ExecuTorch with optimized ops binary size, unstripped:
-rwxr-xr-x  1 swolchok  staff  5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops
(.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test*
__TEXT	__DATA	__OBJC	others	dec	hex
81920	81920	0	4295049216	4295213056	10003c000	cmake-out/test/size_test
1474560	81920	0	4295655424	4297211904	100224000	cmake-out/test/size_test_all_ops
4489216	98304	0	4296359936	4300947456	1005b4000	cmake-out/test/size_test_all_optimized_ops
```

ghstack-source-id: 897d4c9
ghstack-comment-id: 2831329157
Pull-Request-resolved: #10491
@swolchok swolchok merged commit f688329 into main May 9, 2025
76 of 84 checks passed
@swolchok swolchok deleted the gh/swolchok/431/head branch May 9, 2025 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants