Skip to content

[metal] Implement MULTI_DRAW_INDIRECT_COUNT via compute shader emulation#9659

Draft
Bromles wants to merge 6 commits into
gfx-rs:trunkfrom
Bromles:metal-multi_draw_indirect_count
Draft

[metal] Implement MULTI_DRAW_INDIRECT_COUNT via compute shader emulation#9659
Bromles wants to merge 6 commits into
gfx-rs:trunkfrom
Bromles:metal-multi_draw_indirect_count

Conversation

@Bromles

@Bromles Bromles commented Jun 10, 2026

Copy link
Copy Markdown

Connections
None

Description
Metal has no native support for draw_indirect_count / draw_indexed_indirect_count. The feature MULTI_DRAW_INDIRECT_COUNT was not advertised on Metal. This PR attempts to emulate it to support all native primary backends.

The emulation runs a compute shader before the render pass. It reads the count buffer and copies the appropriate number of draw commands from the source indirect buffer into a temporary buffer, then calls regular draw_indirect. Commands beyond the count are zeroed so those draw calls are no-ops.

I was considering Indirect Command Buffers. But Metal ICB does not support providing the draw count from the GPU - the command count must be known at ICB encoding time on the CPU. This would require a GPU-CPU sync to read back the count buffer before encoding, which is too expensive per multi_draw_indirect_count call.

Future interaction with draw_index. wgpu already has SHADER_DRAW_INDEX for Vulkan and GLES, but Metal does not support it yet. When Metal support is added, the emulation would need to inject the draw index into each command in the temp buffer so that draw_indirect calls can expose it.

I used indirect_validation as a reference for the implementation: a device-level MultiDrawEmulation owns the pipeline and a temp buffer pool, while MultiDrawResources manages per-command-buffer resources and returns buffers to the pool on drop.

Testing

  • GPU tests in tests/tests/wgpu-gpu/draw_indirect.rs for indexed/non-indexed and partial count cases
  • Visual example multi_draw_indirect_count
  • All new tests pass on Metal

Squash or Rebase?
Squash

Checklist

  • I self-reviewed and fully understand this PR.
  • WebGPU implementations built with wgpu may be affected behaviorally.
  • Validation and feature gates are in place to confine behavioral changes.
  • Tests demonstrate the validation and altered logic works.
  • CHANGELOG.md entries for the user-facing effects of this change are present.
  • The PR is minimal, and doesn't make sense to land as multiple PRs.
  • Commits are logically scoped and individually reviewable.
  • The PR description has enough context to understand the motivation and solution implemented.

@inner-daemons

Copy link
Copy Markdown
Collaborator

How does this relate to #9640?

Also CC @matthargett

@inner-daemons inner-daemons self-requested a review June 14, 2026 09:15
@Bromles

Bromles commented Jun 14, 2026

Copy link
Copy Markdown
Author

How does this relate to #9640?

Also CC @matthargett

Thanks for pointing it out - I completely missed that PR

After reviewing it, I can say that my PR is more akin to the old multi_draw_indirect implementation, just with a compute shader injection for *_count methods instead of CPU-side loop for plain multi_draw_indirect. While #9640 is a proper but much broader fix for it. I attempted it, but quickly discarded the idea under the assumption that pausing render pass will ruin the performance, especially on tiled GPUs (but didn't verify it, my mistake)

I think my PR can be used as a temporary implementation until we get a proper ICB-based solution for multi_draw_indirect_count, just like CPU-side emulation was used until #9640 for multi_draw_indirect. And later we can implement both draw_id and ICB-based multi_draw_indirect_count based on outstanding Matt's work

@inner-daemons

Copy link
Copy Markdown
Collaborator

Thanks for explaining! I will have to spend more time thinking about it but I don't doubt you.

@cwfitzgerald

Copy link
Copy Markdown
Member

Unfortunately the timing of this PR seems a bit unfortunate :) #9679 will give us MDIC on ICBs. I'm going to move this to a draft until #9679 lands, as I don't want to lose this code if there's some fundamental issue with it, but I suspect we will end up taking that pr.

@cwfitzgerald cwfitzgerald marked this pull request as draft June 15, 2026 20:19
@Bromles

Bromles commented Jun 15, 2026

Copy link
Copy Markdown
Author

Well, seems like I just waited too long to contribute - had this idea for a while, but didn't act on it

No problem, it's even better if we are able to get a proper implementation without emulation

@inner-daemons

Copy link
Copy Markdown
Collaborator

@Bromles Thank you for the contribution anyway :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants