Skip to content

Commit 2852d2e

Browse files
authored
Add new links to blog post (#1810)
Signed-off-by: Chris Abraham <[email protected]>
1 parent 3d52c43 commit 2852d2e

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed

Diff for: _posts/2024-11-01-cutlass-ping-pong-gemm-kernel.md

+7-2
Original file line numberDiff line numberDiff line change
@@ -184,11 +184,16 @@ And translating that into a relative speedup chart of Ping-Pong vs cuBLAS and Tr
184184

185185
The full source code for the Ping-Pong kernel is here (619 lines of deeply templated Cutlass code, or to paraphrase the famous turtle meme - "it's templates...all the way down! ):
186186

187-
[https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp](https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp)
187+
- [https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp](https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp)
188188

189189
In addition, we have implemented PingPong as a CPP extension to make it easy to integrate into use with PyTorch here (along with a simple test script showing it’s usage):
190190

191-
[https://github.com/pytorch-labs/applied-ai/tree/main/kernels/cuda/cutlass_gemm](https://github.com/pytorch-labs/applied-ai/tree/main/kernels/cuda/cutlass_gemm)
191+
- [https://github.com/pytorch-labs/applied-ai/tree/main/kernels/cuda/cutlass_gemm](https://github.com/pytorch-labs/applied-ai/tree/main/kernels/cuda/cutlass_gemm)
192+
193+
Finally, for continued learning, Nvidia has two GTC videos that dive into kernel design with Cutlass:
194+
195+
- [Developing Optimal CUDA Kernels on Hopper Tensor Cores \| GTC Digital Spring 2023 \| NVIDIA On-Demand](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-s51413/)
196+
- [CUTLASS: A Performant, Flexible, and Portable Way to Target Hopper Tensor Cores \| GTC 24 2024 \| NVIDIA On-Demand](https://www.nvidia.com/en-us/on-demand/session/gtc24-s61198/)
192197

193198
## Future Work
194199

0 commit comments

Comments
 (0)