Skip to content

Commit 9684442

Browse files
authored
A couple quick fixes to blog post (#1599)
Signed-off-by: Chris Abraham <[email protected]>
1 parent 083ff72 commit 9684442

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

_posts/2024-04-04-accelerating-moe-model.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ author: Adnan Hoque, Less Wright, Antoni Virós Martin, Chih-Chieh Yang
88

99
We show that by implementing column-major scheduling to improve data locality, we can accelerate the core Triton GEMM (General Matrix-Matrix Multiply) kernel for MoEs (Mixture of Experts) up to 4x on A100, and up to 4.4x on H100 Nvidia GPUs. This post demonstrates several different work decomposition and scheduling algorithms for MoE GEMMs and shows, at the hardware level, why column-major scheduling produces the highest speedup.
1010

11-
Repo and code available at: [https://github.com/pytorch-labs/applied-ai/tree/main/triton/](https://github.com/pytorch-labs/applied-ai/tree/main/triton/inference/col_major_moe_gemm).
11+
Repo and code available at: [https://github.com/pytorch-labs/applied-ai/tree/main/kernels/triton/inference/col_major_moe_gemm](https://github.com/pytorch-labs/applied-ai/tree/main/kernels/triton/inference/col_major_moe_gemm).
1212

1313

1414
![Figure 1A. Optimized Fused MoE GEMM Kernel TFLOPs on A100 for varying Batch Sizes M](/assets/images/accelerating-moe-model/fig-7.png){:style="width:100%;display: block; max-width: 600px; margin-right: auto; margin-left: auto"}
@@ -128,4 +128,4 @@ We have [open sourced](https://github.com/pytorch-labs/applied-ai/tree/main/kern
128128

129129
## Acknowledgements
130130

131-
We want to thank Daniel Han, Raghu Ganti, Mudhakar Srivatsa, Bert Maher, Gregory Chanan, Eli Uriegas, and Geeta Chauhan for their review of the presented material and Woo Suk from the vLLM team as we built on his implementation of the Fused MoE kernel.
131+
We want to thank Daniel Han, Raghu Ganti, Mudhakar Srivatsa, Bert Maher, Gregory Chanan, Eli Uriegas, and Geeta Chauhan for their review of the presented material and Woosuk from the vLLM team as we built on his implementation of the Fused MoE kernel.

0 commit comments

Comments
 (0)