[ExecuTorch][WebGPU] Dynamic tensor-shape resize engine core by JulianCloudNTH · Pull Request #20574 · pytorch/executorch

JulianCloudNTH · 2026-06-28T16:22:10Z

Stack from ghstack (oldest at bottom):

The WebGPU backend baked static tensor shapes at build time, so a dynamic .pte needed a separate graph for each shape (prefill vs. decode). This adds a tensor-shape resize engine mirroring Vulkan: tensors carry live cur_dims ≤ max, inputs resize per call, and a bounded-fixpoint propagates tensor-level resize hooks.

Key changes:

WebGPUTensor: add cur_dims/cur_nbytes (live sizes ≤ max allocation), initialized to max at build
WebGPUGraph: resize_input/set_cur_dims validate live dims fit max, propagate_resize runs tensor hooks for dirty shapes
update_symints_from_inputs reads live cur_dims; adds sym_size.int dim source path
copy_inputs uploads only live bytes; WebGPUBackend::execute shrinks inputs and resizes outputs to live shapes

Static graphs stay byte-identical: cur == max forever, no hooks fire, no reallocations.

Differential Revision: D109906091

[ghstack-poisoned]

pytorch-bot · 2026-06-28T16:22:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20574

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit aa08145 with merge base 55a71e6 ():

NEW FAILURE - The following job has failed:

pull / test-arm-backend-no-driver (test_pytest_ops_no_target) / linux-job (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-28T16:23:02Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

JulianCloudNTH · 2026-06-29T20:22:55Z

@claude review and check for any areas or opportunities for modularization

claude · 2026-06-29T20:23:43Z

Claude finished @JulianCloudNTH's task in 1m 58s —— View job

Code review: WebGPU dynamic resize engine core

Read PR diff and changed files
Analyze correctness & modularization opportunities
Post review

Overall this is a clean, well-commented port of Vulkan's resize model. The "static graph stays byte-identical" invariant holds: cur_dims == dims at build, set_cur_dims only dirties on an actual change, and copy_inputs/propagate_resize short-circuit when nothing is dirty. Below are correctness notes and the modularization opportunities you asked about.

Modularization opportunities

1. utils::numel_of is added but never called — and the duplicate copies it should replace are left in place. This is the biggest one.

WebGPUUtils.h:28 adds:

inline uint64_t numel_of(const std::vector<int64_t>& dims) { ... }

But there are zero callers of utils::numel_of, and three places still hand-roll the same loop:

ops/embedding_q4gsw/EmbeddingQ4gsw.cpp:39 — its own local numel_of
ops/rope/RotaryEmbedding.cpp:37 — its own local numel_of
WebGPUGraph.cpp:107-110 (select_as_symint) and :165-171 (set_cur_dims) — inline product loops

Either wire the new util into those sites or drop it — as-is it's dead code plus two surviving duplicates. Replacing the per-op locals with utils::numel_of is the intended consolidation. Fix this →

2. Extract maybe_resize_inputs / maybe_resize_outputs from WebGPUBackend::execute. The two new loops at WebGPUBackend.cpp:118-122 and :128-138 are exactly the Vulkan maybe_resize_input/maybe_resize_output pair the comments reference, inlined into execute. Pulling them into two private methods (or free helpers) would match the Vulkan structure the PR is mirroring, keep execute readable, and put the std::vector conversions behind a named boundary. Given the CLAUDE.md "no trivial single-use helper" rule these are borderline, but each is ~5–10 lines with a clear name and a 1:1 Vulkan analog, so I'd lean toward extracting.

3. A normalize_dim(dim, rank) + bounds-check helper. This pattern now appears three times verbatim:

int dim = s.dim < 0 ? s.dim + static_cast<int>(d.size()) : s.dim;
if (dim < 0 || dim >= static_cast<int>(d.size())) { throw ...; }

at WebGPUGraph.cpp:96-98, :137-140, and the negative-index logic in set_cur_dims neighbors. A single small helper would remove the repetition and the easy-to-miss sign/cast bugs.

Correctness / robustness

4. select_as_symint empty-input guard now bypassable via cur_dims. At WebGPUGraph.cpp:107-112, numel is computed from live cur_dims and rejected if <= 0. That's correct, but note a dynamic input legitimately resized so one source dim is 0 would now throw where the static path never could. Worth confirming dynamic symint sources can never have a zero live dim, or the guard may need to tolerate it.

5. Per-call heap allocations on the hot path. execute allocates a std::vector<int64_t> per input (:120) and a std::vector<SizesType> per output (:130) every invocation. For decode loops this runs once per token. Minor, but a reusable scratch buffer on the graph (or a SmallVector/stack array for typical low ranks) would avoid it. Not blocking.

6. propagate_resize fixpoint bound looks right. pass <= tensor_resize_hooks_.size() with set_cur_dims only re-dirtying on change gives convergence in ≤ depth passes for a forward DAG, and processing.swap(dirty_tensors_) cleanly separates this pass's set from cascades. No issue — just flagging that the loop will silently stop after size()+1 passes if a cycle ever existed; an assert/log on !dirty_tensors_.empty() after the loop would catch a malformed graph instead of executing on stale shapes.

7. No op registers a tensor resize hook yet. add_tensor_resize_hook and the add_dispatch index return are added but unused in this PR — expected for an "engine core" change, just confirming the dispatch-rewrite consumers land in a later stack entry.

CI note

The reported failure is test-arm-backend-no-driver — an Arm backend job unrelated to these WebGPU-only files; almost certainly not caused by this PR.
· branch gh/JulianCloudNTH/66/head

Update

aa08145

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 28, 2026 16:22 — with GitHub Actions Inactive

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ExecuTorch][WebGPU] Dynamic tensor-shape resize engine core#20574

[ExecuTorch][WebGPU] Dynamic tensor-shape resize engine core#20574
JulianCloudNTH wants to merge 1 commit into
gh/JulianCloudNTH/66/basefrom
gh/JulianCloudNTH/66/head

JulianCloudNTH commented Jun 28, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

JulianCloudNTH commented Jun 29, 2026

Uh oh!

claude Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

JulianCloudNTH commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20574

❌ 1 New Failure

Uh oh!

github-actions Bot commented Jun 28, 2026

This PR needs a release notes: label

Uh oh!

JulianCloudNTH commented Jun 29, 2026

Uh oh!

claude Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code review: WebGPU dynamic resize engine core

Modularization opportunities

Correctness / robustness

CI note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JulianCloudNTH commented Jun 28, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 28, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Jun 29, 2026 •

edited

Loading