Skip to content

Fix signed % wrong for negative operands on NVIDIA (OpSRem poison without maintenance8)#9674

Open
mstampfli wants to merge 1 commit into
gfx-rs:trunkfrom
mstampfli:fix/8191-signed-remainder-poison
Open

Fix signed % wrong for negative operands on NVIDIA (OpSRem poison without maintenance8)#9674
mstampfli wants to merge 1 commit into
gfx-rs:trunkfrom
mstampfli:fix/8191-signed-remainder-poison

Conversation

@mstampfli

@mstampfli mstampfli commented Jun 14, 2026

Copy link
Copy Markdown

Fixes #8191.

Problem

WGSL defines signed integer % for all non-degenerate operands - -1 % 768 must be -1 (truncated remainder, sign of the dividend; WGSL §8.7). But on NVIDIA it returns 255, i.e. (-1 as u32) % 768.

naga lowers signed % to SPIR-V OpSRem. In the Vulkan SPIR-V environment, OpSRem/OpSMod with a negative operand produce a poison (undefined) result unless VK_KHR_maintenance8 is enabled (Vulkan spec, SPIR-V environment appendix). wgpu does not enable that extension, so the result is undefined for negative operands. NVIDIA exercises that latitude; Mesa (ANV, llvmpipe) happen to define it. So this is a wgpu/naga conformance gap, not an NVIDIA driver bug.

Evidence

Same naga-generated SPIR-V, run on every Vulkan adapter (no maintenance8):

inputs              : [-1, -5, -768, -769, -1000, 0, 5, 768, 1000]
spec-correct  a%b   : [-1, -5,    0,   -1,  -232, 0, 5,   0,  232]
NVIDIA RTX 5060 610.43.02 : [255, 251, 256, 255, 24, ...]   WRONG (poison)
Intel Mesa (ANV)          : [-1, -5, 0, -1, -232, ...]      correct
llvmpipe (Mesa)           : [-1, -5, 0, -1, -232, ...]      correct

Enabling VK_KHR_maintenance8 on the same NVIDIA device makes OpSRem correct (-1), confirming NVIDIA is conformant and the SPIR-V is the issue.

Fix

OpSDiv is not poisoned for negative operands, so signed % can be lowered without OpSRem:

  • New naga::back::spv::Options::avoid_signed_int_remainder (defaults false → existing output byte-for-byte unchanged). When set, the existing naga_mod wrapper emits a - b * (a / b) instead of OpSRem, reusing the wrapper's zero/INT_MIN-guarded divisor so degenerate cases still match WGSL's required 0.
  • The Vulkan backend enables it for NVIDIA adapters.

Verified end-to-end: with the flag, the patched naga's SPIR-V returns the WGSL-correct values on NVIDIA without maintenance8.

Alternative / follow-up

A perf-optimal complement is to enable VK_KHR_maintenance8 in wgpu-hal when the adapter supports it (no per-op cost), keeping this polyfill as the fallback for drivers/targets that lack it. Happy to do that here or as a follow-up - whichever maintainers prefer. (Also open to gating the polyfill differently, e.g. always-on for signed %, or via a Workarounds bit.)

@mstampfli mstampfli force-pushed the fix/8191-signed-remainder-poison branch 2 times, most recently from 02d3cc4 to 37bd4f0 Compare June 14, 2026 13:26
@mstampfli

Copy link
Copy Markdown
Author

For reviewers: a standalone reproducer is here: https://github.com/mstampfli/wgsl-mod-repro — it has a cross-driver differential (NVIDIA vs Intel ANV vs llvmpipe on the same naga SPIR-V), a SPIR-V opcode dump showing naga emits OpSRem, and a raw-Vulkan A/B that toggles VK_KHR_maintenance8 to confirm NVIDIA is conformant and the result is poison without it.

@inner-daemons

Copy link
Copy Markdown
Collaborator

You will need to add some snapshot that demonstrates this change I think, so that we notice if it ever goes back. Frankly I'm surprised none of our current snapshots are affected.

mstampfli added a commit to mstampfli/wgpu that referenced this pull request Jun 17, 2026
Add a SPIRV snapshot that exercises the avoid_signed_int_remainder option so a regression is caught if signed % ever goes back to OpSRem. SpirvOutParameters now exposes the option through the [spv] toml section (default false), and int-signed-remainder.wgsl enables it: the generated SPIR-V reconstructs signed % as a - b * (a / b) (no OpSRem) while unsigned % stays OpUMod.

Requested in review of gfx-rs#9674.
@mstampfli

Copy link
Copy Markdown
Author

Good call, added in b65ab07.

On why nothing was affected: the option defaults to false, and the snapshot harness was hardcoding avoid_signed_int_remainder: false when it built spv::Options, so no snapshot could ever reach the new path. I plumbed it through SpirvOutParameters so a test can turn it on via the [spv] section (still defaulting to false, so every existing snapshot stays byte-for-byte unchanged).

The new snapshot is naga/tests/in/wgsl/int-signed-remainder.wgsl:

targets = "SPIRV"

[spv]
avoid_signed_int_remainder = true

It covers scalar and vector signed % plus an unsigned %. In the generated wgsl-int-signed-remainder.spvasm the signed cases lower to the guarded OpSDiv / OpIMul / OpISub reconstruction, with no OpSRem or OpSMod anywhere, while unsigned % stays a plain OpUMod. If the lowering ever regresses to OpSRem, the snapshot regenerates with it and the CI diff check fails.

@inner-daemons

Copy link
Copy Markdown
Collaborator

IMO this should not be an obscure option because the result of not using it is undefined behavior when correctly calling functions the way they are described in the WGSL spec. I think this fix should apply globally. It is likely that the actual GPU compilers will be able to optimize it away in many cases anyway.

@mstampfli

Copy link
Copy Markdown
Author

Agreed, and that lines up with the sibling GLSL PR (#9687), which already applies the reconstruction unconditionally for the same reason: % is defined for negative operands per the spec, so emitting the poison op is UB regardless of vendor. NVIDIA just exercises latitude the others currently do not.

Happy to make this the default in the SPIR-V backend too and drop both the opt-in default and the NVIDIA-only gating in wgpu-hal. One scoping question before I regenerate the snapshots:

When the device supports VK_KHR_maintenance8, OpSRem is well-defined, so we could still emit it there and only reconstruct as the fallback. Which do you prefer:

(a) always reconstruct a - b * (a / b) unconditionally (simplest, keeps the backend free of device-feature knowledge, relies on the driver to fold it back), or
(b) reconstruct by default but keep OpSRem when maintenance8 is available (avoids the extra ops on conformant setups, at the cost of a feature check).

I lean (a), which matches your "compilers can optimize it away" point, unless you would rather have (b). Either way this changes every existing signed-% SPIR-V snapshot, which also answers the earlier "why are none of our snapshots affected."

@mstampfli mstampfli force-pushed the fix/8191-signed-remainder-poison branch from 69e0653 to b4bc2e7 Compare June 17, 2026 21:51
WGSL defines signed `%` for all non-degenerate operands, but naga lowered it to `OpSRem`, which produces a poison (undefined) result for negative operands in the Vulkan SPIR-V environment unless `VK_KHR_maintenance8` is enabled. NVIDIA exercises that latitude (e.g. `-1 % 768` yields `255` instead of `-1`); other drivers happen to define it.

Always lower signed `%` as `a - b * (a / b)` instead. `OpSDiv` is not poisoned for negative operands, and the existing wrapped-divisor guard keeps the degenerate (zero and `INT_MIN / -1`) cases matching the WGSL spec. This is unconditional rather than gated to one vendor, since the poison is a property of the SPIR-V environment, not the driver.

Fixes gfx-rs#8191.
@mstampfli mstampfli force-pushed the fix/8191-signed-remainder-poison branch from b4bc2e7 to fe0560f Compare June 17, 2026 22:05
@mstampfli

Copy link
Copy Markdown
Author

Decided on (a): always reconstruct, no VK_KHR_maintenance8 fast-path. The latest commit makes the poison-free lowering the default in the SPIR-V backend and drops the opt-in option and the NVIDIA-only gating entirely.

I benchmarked first to make sure a maintenance8 fast-path was not worth the complexity. On an RTX 5060 (driver 610.43.02), raw Vulkan, comparing native OpSRem with VK_KHR_maintenance8 enabled (so it is the well-defined op) against the a - b * (a / b) reconstruction. Workload: 2^20 invocations x 2048 iterations (~2.1B signed modulo ops per dispatch), with a runtime-varying divisor so the driver cannot strength-reduce it into a constant multiply-shift:

OpSRem (maintenance8)      : min ~18.7 ms
reconstruction a - b*(a/b) : min ~18.7 ms
reconstruction / OpSRem    : 0.986x, 0.991x, 0.997x across 3 runs (reconstruction marginally faster)
outputs identical          : yes

So enabling maintenance8 to keep OpSRem buys nothing measurable, which matches your point about drivers optimizing it away. GPUs have no hardware integer modulo, so OpSRem is lowered to the same divide/multiply/subtract sequence internally. Given that, unconditional reconstruction is the simpler and equally fast choice, and it is correct on every Vulkan target rather than one vendor.

Side effect for the earlier snapshot request: several existing snapshots (operators, int-div-unchecked, int16, image) now show the reconstruction, so the change is exercised by the suite. Happy to share the benchmark harness if useful.

@inner-daemons

Copy link
Copy Markdown
Collaborator

I would've preferred option b but thats fine. We tend to lean on these clarifying vulkan extensions where possible, for example for out of bounds writes and zero initialization. But since you demonstrated it doesn't matter this should be fine.

You can probably drop the new snapshot inputs by the way, unless you see a reason to keep them. Sorry for directing you to do that because I failed to understand the state of the PR.

I'll give a proper review now that my initial questions were answered. Thank you for helping me understand

@inner-daemons inner-daemons self-requested a review June 17, 2026 23:56
@inner-daemons inner-daemons self-assigned this Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Naga incorrect % behaviour using proprietary nvidia drivers.

2 participants