[ExecuTorch][WebGPU] Enable FlashDecoding by default for decode SDPA (runtime shape gate) by pytorchbot · Pull Request #20586 · pytorch/executorch

pytorchbot · 2026-06-28T23:21:06Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #20544 by @JulianCloudNTH
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/64/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/64/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/64/orig

@diff-train-skip-merge

…(runtime shape gate) Pull Request resolved: #20544 **Makes split-KV FlashDecoding the default decode-attention path** (it was shipped dormant behind a default-OFF compile flag). FD is the fastest WebGPU SDPA decode arm (**+178% vs naive**, M4 Pro, isolated op); this turns it on for production and selects it at runtime by a shape-capability predicate. {F1991715077} **Problem:** the FD kernel is correct and measured (+178%) but compile-gated OFF, so no production build used it. A device-limit gate (web-llm-style `maxStorageBufferBindingSize`) was considered but is dead code here: FD's resource needs (workgroup size 64, 512 B shared memory, 5 storage bindings) are all below WebGPU's baseline minimum limits, and FD binds the same K/V caches as the materialized fallback — so no spec-compliant device can run materialized decode but fail FD. The only selection criterion with real effect is shape. **Solution:** enable FD by default and select it at runtime on shape, not device. - **Before:** `EXECUTORCH_BUILD_WEBGPU_SDPA_FD` default OFF; FD code unlinked; every decode used the materialized QK/softmax/AV path. - **After:** flag default ON (kept as a build-time kill-switch); decode (`S == 1`, static input_pos) with head dim `<= kSdpaFdMaxHeadDim` uses FD; other shapes (including head dim > 128) fall through to the materialized path. **Implementation:** - `Sdpa.cpp`: extend the FD selection predicate with `D <= kSdpaFdMaxHeadDim` so unsupported head dims fall through instead of throwing. - `SdpaFdDecode.h`: expose `kSdpaFdMaxHeadDim` (FD's lane-owns-D reach) as the single source of truth; `SdpaFdDecode.cpp` ties it to `WG_SIZE * MAX_D_PER_LANE` with a `static_assert`. - `CMakeLists.txt` (fbcode + xplat): flip the option default to ON; OFF remains a kill-switch that drops all FlashDecoding code. - `test_webgpu_native_ci.sh`: drop the now-redundant explicit `=ON` flag so CI builds and tests the default. - Mirrors Vulkan `backends/vulkan/runtime/graph/ops/impl/SDPA.cpp` shape-based kernel selection (`is_single_token`); no device-adaptive gate, matching the Vulkan delegate. **Constraints:** decode-only (`S == 1`), static input_pos (dynamic-pos decode still uses the materialized path); fp32, buffer-only; the FD kernels are unchanged by this diff. Co-authored with Claude Code. ghstack-source-id: 397864676 @exported-using-ghexport @diff-train-skip-merge Differential Revision: [D109520722](https://our.internmc.facebook.com/intern/diff/D109520722/)

pytorch-bot · 2026-06-28T23:21:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20586

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 87 Pending

As of commit 597d658 with merge base 55a71e6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-28T23:21:54Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

pytorchbot requested review from kirklandsign and larryliu0820 as code owners June 28, 2026 23:21

pytorchbot temporarily deployed to cadence June 28, 2026 23:21 — with GitHub Actions Inactive

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2026

JulianCloudNTH self-requested a review June 28, 2026 23:35

JulianCloudNTH approved these changes Jun 28, 2026

View reviewed changes

JulianCloudNTH merged commit 821b5a9 into main Jun 28, 2026
183 of 185 checks passed

JulianCloudNTH deleted the gh/JulianCloudNTH/64/orig branch June 28, 2026 23:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ExecuTorch][WebGPU] Enable FlashDecoding by default for decode SDPA (runtime shape gate)#20586

[ExecuTorch][WebGPU] Enable FlashDecoding by default for decode SDPA (runtime shape gate)#20586
JulianCloudNTH merged 1 commit into
mainfrom
gh/JulianCloudNTH/64/orig

pytorchbot commented Jun 28, 2026

Uh oh!

pytorch-bot Bot commented Jun 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

pytorchbot commented Jun 28, 2026

Uh oh!

pytorch-bot Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20586

⏳ No Failures, 87 Pending

Uh oh!

github-actions Bot commented Jun 28, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented Jun 28, 2026 •

edited

Loading

This PR needs a `release notes:` label