[ExecuTorch][WebGPU] test: q4gsw shmem-GEMM prefill golden configs by JulianCloudNTH · Pull Request #20606 · pytorch/executorch

JulianCloudNTH · 2026-06-29T21:08:24Z

Stack from ghstack (oldest at bottom):

-> [ExecuTorch][WebGPU] test: q4gsw shmem-GEMM prefill golden configs #20606
[ExecuTorch][WebGPU] q4gsw prefill: shared-memory tiled GEMM, shape-routed (+150–303% large-K/N) #20605

Add native fp64-golden configs that exercise the shared-memory tiled GEMM prefill route.

The existing linear_q4gsw prefill configs (q_proj_4k/kv_proj_4k, K=2048, N<=2048) all route to the register-tiled GEMM, so the new shmem GEMM route (K >= 4096 || N >= 4096) had no coverage. This adds three configs — kept in lockstep between test_quantized_linear.py CONFIGS and the C++ kQ4gswConfigs table — that exercise it (the op lands in the stacked diff below):

gate_proj_pf (M128 K2048 N8192) — shmem via large N (gate/up prefill).
down_proj_pf (M128 K8192 N2048) — shmem via large K (down prefill), looser big-K tol.
shmem_edge (M130 K4096 N2056) — partial 32-tile bounds (M and N not multiples of 32).

Co-authored-with: Claude Code.
@exported-using-ghexport

Differential Revision: D110095129

[ghstack-poisoned]

pytorch-bot · 2026-06-29T21:08:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20606

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 1164e70 with merge base b3e11d3 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-qnn-testsuite-linux / test-backend-linux (qnn, models) / linux-job (gh) (matched linux rule in flaky-rules.json)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
Test ARM Backend / test-arm / test-backend-linux (arm_tosa_fp, models) / linux-job (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-29T21:09:19Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

[ghstack-poisoned]

Pull Request resolved: #20606 **Add native fp64-golden configs that exercise the shared-memory tiled GEMM prefill route.** The existing `linear_q4gsw` prefill configs (`q_proj_4k`/`kv_proj_4k`, `K=2048`, `N<=2048`) all route to the register-tiled GEMM, so the new shmem GEMM route (`K >= 4096 || N >= 4096`) had no coverage. This adds three configs — kept in lockstep between `test_quantized_linear.py` `CONFIGS` and the C++ `kQ4gswConfigs` table — that exercise it (the op lands in the stacked diff below): - `gate_proj_pf` (`M128 K2048 N8192`) — shmem via large `N` (gate/up prefill). - `down_proj_pf` (`M128 K8192 N2048`) — shmem via large `K` (down prefill), looser big-K tol. - `shmem_edge` (`M130 K4096 N2056`) — partial 32-tile bounds (`M` and `N` not multiples of 32). Co-authored-with: Claude Code. ghstack-source-id: 398396337 @exported-using-ghexport Differential Revision: [D110095129](https://our.internmc.facebook.com/intern/diff/D110095129/)

Update

adf6c95

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 29, 2026 21:08 — with GitHub Actions Inactive

JulianCloudNTH mentioned this pull request Jun 29, 2026

[ExecuTorch][WebGPU] q4gsw prefill: shared-memory tiled GEMM, shape-routed (+150–303% large-K/N) #20605

Open

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 29, 2026

Update

1164e70

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 30, 2026 06:00 — with GitHub Actions Inactive

meta-codesync Bot added the meta-exported label Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ExecuTorch][WebGPU] test: q4gsw shmem-GEMM prefill golden configs#20606

[ExecuTorch][WebGPU] test: q4gsw shmem-GEMM prefill golden configs#20606
JulianCloudNTH wants to merge 2 commits into
gh/JulianCloudNTH/80/basefrom
gh/JulianCloudNTH/80/head

JulianCloudNTH commented Jun 29, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

JulianCloudNTH commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20606

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

github-actions Bot commented Jun 29, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JulianCloudNTH commented Jun 29, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 29, 2026 •

edited

Loading

This PR needs a `release notes:` label