-
Notifications
You must be signed in to change notification settings - Fork 543
Reapply #9841: Migrate elementwise_util callers to the variants with out_dtypes in template arguments #10491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Stack from ghstack (oldest at bottom): |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10491
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…out_dtypes in template arguments This is necessary to take advantage of #9388, which creates dtype-specialized implementations for the non-mixed dtype case. Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size: ``` Before: ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:51 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1310720 81920 0 4295458816 4296851456 1001cc000 cmake-out/test/size_test_all_ops 4358144 98304 0 4296212480 4300668928 100570000 cmake-out/test/size_test_all_optimized_ops After: ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:57 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1376256 81920 0 4295491584 4296949760 1001e4000 cmake-out/test/size_test_all_ops 4423680 98304 0 4296327168 4300849152 10059c000 cmake-out/test/size_test_all_optimized_ops ``` However, on an absolute basis, size is still below where we are at two PRs ago, which was: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:24 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1474560 81920 0 4295655424 4297211904 100224000 cmake-out/test/size_test_all_ops 4489216 98304 0 4296359936 4300947456 1005b4000 cmake-out/test/size_test_all_optimized_ops ``` ghstack-source-id: 9c6751f ghstack-comment-id: 2831329157 Pull-Request-resolved: #10491
…out_dtypes in template arguments This is necessary to take advantage of #9388, which creates dtype-specialized implementations for the non-mixed dtype case. Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size: ``` Before: ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:51 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1310720 81920 0 4295458816 4296851456 1001cc000 cmake-out/test/size_test_all_ops 4358144 98304 0 4296212480 4300668928 100570000 cmake-out/test/size_test_all_optimized_ops After: ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:57 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1376256 81920 0 4295491584 4296949760 1001e4000 cmake-out/test/size_test_all_ops 4423680 98304 0 4296327168 4300849152 10059c000 cmake-out/test/size_test_all_optimized_ops ``` However, on an absolute basis, size is still below where we are at two PRs ago, which was: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:24 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1474560 81920 0 4295655424 4297211904 100224000 cmake-out/test/size_test_all_ops 4489216 98304 0 4296359936 4300947456 1005b4000 cmake-out/test/size_test_all_optimized_ops ``` ghstack-source-id: 58e2b05 ghstack-comment-id: 2831329157 Pull-Request-resolved: #10491
internal diff number for size check on this stack is D73691545 |
…out_dtypes in template arguments This is necessary to take advantage of #9388, which creates dtype-specialized implementations for the non-mixed dtype case. Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size: ``` Before: ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:51 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1310720 81920 0 4295458816 4296851456 1001cc000 cmake-out/test/size_test_all_ops 4358144 98304 0 4296212480 4300668928 100570000 cmake-out/test/size_test_all_optimized_ops After: ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:57 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1376256 81920 0 4295491584 4296949760 1001e4000 cmake-out/test/size_test_all_ops 4423680 98304 0 4296327168 4300849152 10059c000 cmake-out/test/size_test_all_optimized_ops ``` However, on an absolute basis, size is still below where we are at two PRs ago, which was: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:24 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1474560 81920 0 4295655424 4297211904 100224000 cmake-out/test/size_test_all_ops 4489216 98304 0 4296359936 4300947456 1005b4000 cmake-out/test/size_test_all_optimized_ops ``` ghstack-source-id: 1eb6943 ghstack-comment-id: 2831329157 Pull-Request-resolved: #10491
CI is green, just rebasing due to spurious conflicts from stacked land |
…out_dtypes in template arguments This is necessary to take advantage of #9388, which creates dtype-specialized implementations for the non-mixed dtype case. Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size: ``` Before: ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:51 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1310720 81920 0 4295458816 4296851456 1001cc000 cmake-out/test/size_test_all_ops 4358144 98304 0 4296212480 4300668928 100570000 cmake-out/test/size_test_all_optimized_ops After: ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:57 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1376256 81920 0 4295491584 4296949760 1001e4000 cmake-out/test/size_test_all_ops 4423680 98304 0 4296327168 4300849152 10059c000 cmake-out/test/size_test_all_optimized_ops ``` However, on an absolute basis, size is still below where we are at two PRs ago, which was: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:24 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1474560 81920 0 4295655424 4297211904 100224000 cmake-out/test/size_test_all_ops 4489216 98304 0 4296359936 4300947456 1005b4000 cmake-out/test/size_test_all_optimized_ops ``` ghstack-source-id: 897d4c9 ghstack-comment-id: 2831329157 Pull-Request-resolved: #10491
This is necessary to take advantage of #9388, which
creates dtype-specialized implementations for the non-mixed dtype case.
Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size:
However, on an absolute basis, size is still below where we are at two PRs ago, which was: