Add wave intrinsic blog post #106

swoods-nv · 2025-05-21T23:00:43Z

No description provided.

swoods-nv · 2025-05-21T23:05:08Z

I need to add a few sentences about performance delta -- I'm seeing ~10 iterations per second improvement with wave balloting vs. the original diffsplat on my Windows 11/RTX 2070 machine.

ArielG-NV

I like the blog post and I find it easy to digest (despite my limited background in Gaussian Splatting).

Given the goal of the blog is not capabilities I think the introduction of the feature is well done.

csyonghe · 2025-05-22T16:08:36Z

_posts/2025-05-27-ng-wave-intrinsics.md

+    return pixelState.value;
+}
+```  
+The first thing you’ll likely notice is that this function carries additional annotations compared to the functions in the original diff-splatting example. The `[require (subgroup_ballot)]` and `[require (subgroup_vote)]` annotations use slang’s **capability system** to indicate that this function requires this optional capability to be supported. The Slang compiler is able to identify whether the target it is currently compiling for supports these capabilities, and if not, it will provide a warning. For example, a shader targeting HLSL Shader Model 5 with these capability requirements would result in:


For most of the code written by our users, we don't expect them to use explicit [require(capability)] decorations, so we may want to omit this part in the blog.

For ordinary users, the only place they need to put [require] attribute is on the entrypoints, and that is only when they want the compiler to enforce the entrypoint isn't using more capabilities than it is intended for.

If you define an entrypoint without [require], the compiler will automatically upgrade its requirement to whatever it uses.

I've updated the blog post to just cover the fact that the Slang compiler will detect and issue a warning if you try to compile this for a profile that doesn't include wave intrinsic support.

csyonghe · 2025-05-22T16:09:52Z

_posts/2025-05-27-ng-wave-intrinsics.md

+The first thing you’ll likely notice is that this function carries additional annotations compared to the functions in the original diff-splatting example. The `[require (subgroup_ballot)]` and `[require (subgroup_vote)]` annotations use slang’s **capability system** to indicate that this function requires this optional capability to be supported. The Slang compiler is able to identify whether the target it is currently compiling for supports these capabilities, and if not, it will provide a warning. For example, a shader targeting HLSL Shader Model 5 with these capability requirements would result in:
+
+```  
+myshader.slang(9): warning 41012: entry point 'computeMain' uses additional capabilities that are not part of the specified profile 'sm_5_0'. The profile setting is automatically updated to include these capabilities: 'sm_6_0'  


This warning will appear without marking the above functions with [require], becuase [require] was already marked on WaveActive** functions and it just propagates all the way through to the entrypoint.

csyonghe · 2025-05-22T17:34:08Z

_posts/2025-05-27-ng-wave-intrinsics.md

+So how does this shader use wave intrinsics?
+
+Instead of a multi-pass approach– first identifying intersecting blobs for the current tile, sorting them, and then calculating colors from the shorter list of blobs, we’re now using a single pass through the set of Gaussians to process them all, in workgroup-sized chunks. Within each chunk, each lane (a thread within the wave) is assigned a single Gaussian, and tests whether it intersects the current tile bounds. The crucial improvement here is the `WaveActiveBallot(intersects).x` call. This takes the boolean intersection result from each active lane in the wave, and creates a bitmask. All of the lanes in the wave can access the bitmask, and can therefore understand which Gaussians in the chunk being processed are relevant. The code then iterates through the set bits of this mask, which we’ve called `intersectionMask`. For each intersection Gaussian, its contribution is evaluated, and immediately alpha-blended. We still store the indices for the intersecting blobs, because we will still need them during the custom backward pass.  
+One benefit of this approach is that we no longer need to do an explicit workgroup-wide sort. Because we keep the blobs in order during processing, we maintain the needed order for alpha blending. Additionally, we no longer need to use an atomic counter– and thereby introduce the possibility of contention– when we increment the number of intersecting blobs and write the index to the blob list. This might look problematic at first glance, because all of the lanes are writing to the same `intersectingBlobList` in shared memory. But we don’t need to worry about data collisions here because of how we’re coming up with this data. Each lane has its own copy of numIntersectingBlobs, so that variable does not need to be atomically incremented. And each lane also will be operating on the same value in `intersectionMask`, calculated using `WaveActiveBallot`. For this reason, all lanes are storing the same indices in the same order into `intersectingBlobList`, so while technically this is a data race, it’s a benign one.  


I don't quite understand why we need to introduce any data races at all.

We should instead do

int idx = WavePrefixSum(intersects?1:0);

and then each lane just write to intersectingBlobList[idx]. There will be no data races if done this way.

I still need to increment the overall number of intersecting blobs (intersectingBlobCount), which is used in the backwards pass. With all of the lanes calculating the same value for numIntersectingBlobs, it can be done non-atomically -- but still a technical data race. If I'm not calculating numIntersectingBlobs at all, then I would need to make intersectingBlobCount an atomic, which then complicates its use elsewhere.

swoods-nv · 2025-05-23T21:22:06Z

After discussion with Yong: the wave intrinsic example only works in cases where there's only one workgroup per dispatch. We should wait to post this until the dispatch shape is controllable from SlangPy, so this is blocked on shader-slang/slangpy#72

swoods-nv · 2025-07-15T22:24:39Z

Blog post updated with call group shape info -- @csyonghe could you re-review?

csyonghe

Looks good to me.

swoods-nv added 3 commits May 21, 2025 18:43

Add wave intrinsic blog post

dbf12a2

Add image for wave intrinsic blog post

50c4102

Add blog header & links out to previous articles

353d453

swoods-nv requested review from ArielG-NV, csyonghe and saipraveenb25 May 21, 2025 23:00

swoods-nv linked an issue May 21, 2025 that may be closed by this pull request

Create blog post for improving performance of 2D splatting using wave intrinsics shader-slang/slang#7020

Closed

ArielG-NV reviewed May 22, 2025

View reviewed changes

csyonghe reviewed May 22, 2025

View reviewed changes

Add performance information, remove coverage of "require" annotation

b64db17

csyonghe reviewed May 22, 2025

View reviewed changes

swoods-nv mentioned this pull request May 23, 2025

Create blog post for improving performance of 2D splatting using wave intrinsics shader-slang/slang#7020

Closed

swoods-nv and others added 2 commits July 15, 2025 17:42

Merge branch 'shader-slang:main' into wave-intrinsic-blog-post

97d5abd

Update wave intrinsics post to new date, add call group shape info.

1cd7c32

csyonghe approved these changes Jul 17, 2025

View reviewed changes

swoods-nv merged commit ba1a9c3 into shader-slang:main Jul 17, 2025

swoods-nv deleted the wave-intrinsic-blog-post branch July 17, 2025 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add wave intrinsic blog post #106

Add wave intrinsic blog post #106

Uh oh!

swoods-nv commented May 21, 2025

Uh oh!

swoods-nv commented May 21, 2025

Uh oh!

ArielG-NV left a comment

Uh oh!

csyonghe May 22, 2025

Uh oh!

swoods-nv May 22, 2025 •

edited

Loading

Uh oh!

csyonghe May 22, 2025

Uh oh!

csyonghe May 22, 2025

Uh oh!

swoods-nv May 22, 2025

Uh oh!

swoods-nv commented May 23, 2025

Uh oh!

swoods-nv commented Jul 15, 2025

Uh oh!

csyonghe left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add wave intrinsic blog post #106

Add wave intrinsic blog post #106

Uh oh!

Conversation

swoods-nv commented May 21, 2025

Uh oh!

swoods-nv commented May 21, 2025

Uh oh!

ArielG-NV left a comment

Choose a reason for hiding this comment

Uh oh!

csyonghe May 22, 2025

Choose a reason for hiding this comment

Uh oh!

swoods-nv May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

csyonghe May 22, 2025

Choose a reason for hiding this comment

Uh oh!

csyonghe May 22, 2025

Choose a reason for hiding this comment

Uh oh!

swoods-nv May 22, 2025

Choose a reason for hiding this comment

Uh oh!

swoods-nv commented May 23, 2025

Uh oh!

swoods-nv commented Jul 15, 2025

Uh oh!

csyonghe left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

swoods-nv May 22, 2025 •

edited

Loading