fix fp8 blockscale configs for deepseek #1232

juuso-oskari · 2025-10-21T09:13:01Z

This PR provides finetuned configs for deepseek fp8 blockscale. Also fixes the benching script (K = intermediate dimension // 2 for fc2 if GLU is used, NOT N = intemediate dimension * 2 for fc1).

anhminhnguyenhoang · 2025-10-23T11:44:27Z

Tensor shapes M=[64, 128, 256], N=2112, M=7168 for Deepseek model has been tuned for Triton block configs facilitating at least 2x uplift in performance in comparison to the current state in main branch.

# before
     M     N     K  throughput (TFLOPs)  time (ms)  bandwidth (GB/s)
0   64  2112  7168            14.984021   0.128799        123.609343
1  128  2112  7168            30.529371   0.127317        131.801976
2  256  2112  7168            66.172665   0.117496        154.754403

# after
     M     N     K  throughput (TFLOPs)  time (ms)  bandwidth (GB/s)
0   64  2112  7168            27.073608   0.068483        221.219118
1  128  2112  7168            89.067354   0.043015        372.392590
2  256  2112  7168           180.079694   0.045219        414.613027

Ps: Results rendered on asrock-1w300-e0-3.mkm.dcgpu (mi350xas2)

fix fp8 blockscale configs for deepseek

d831784

juuso-oskari assigned juuso-oskari and anhminhnguyenhoang Oct 21, 2025

anhminhnguyenhoang added 4 commits October 21, 2025 14:28

feat: add hand tuned configs for M=256 N=2112 K=7168

b727a1d

feat: add better config for shape large

769a39c

feat: add better config for multiple M sizes

0c4a01f

feat: add better tuned config for M=256

5520eda

anhminhnguyenhoang and others added 2 commits October 23, 2025 14:44

Merge branch '355_wip' into deepseek-fp8-gemm-tuning

143f41b

fix: linting

dcd1792

anhminhnguyenhoang requested review from rahulbatra85 and vgokhale October 23, 2025 12:04

anhminhnguyenhoang marked this pull request as ready for review October 23, 2025 12:04

feat: add better tuned config for M=[4, 8, 16, 32]

baf1d4a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix fp8 blockscale configs for deepseek #1232

fix fp8 blockscale configs for deepseek #1232

Uh oh!

juuso-oskari commented Oct 21, 2025

Uh oh!

anhminhnguyenhoang commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix fp8 blockscale configs for deepseek #1232

Are you sure you want to change the base?

fix fp8 blockscale configs for deepseek #1232

Uh oh!

Conversation

juuso-oskari commented Oct 21, 2025

Uh oh!

anhminhnguyenhoang commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants