[TRITON] Benchmarking improvements #1063

eky-amd · 2025-09-23T17:06:44Z

Changes

When running with a given model, all benchmark scripts now always output the metric evaluated. Previously, table columns would only show "fc1" and/or "fc2", which is not helpful when reading from the terminal.
Standardize table/graph labelling across Triton benchmarking scripts (standard formatting: <Provider>_<Layer>_<Metric>_(<Unit>)). Provider describes whether the bench was performed using the Triton kernel or Torch functions. Layer is either fc1 or fc2 for model benchmarks. Both are optional. Metric is either time (ms), throughput (TFLOPS), or bandwidth (GB/s).
Resolve arch_info and get_splitk import issues in triton/bench_batched_gemm_afp4wfp4.py, triton/bench_batched_gemm_afp4wfp4_pre_quant.py and triton/bench_gemm_a8wfp4.py.
Resolve STR_DTYPE_TO_TORCH_DTYPE import issue in triton/bench_pa_prefill.py.
Implement comparisons with PyTorch for the Triton GEMM kernels, using the -bench_torch flag.
Fix default parameters in op_tests/op_benchmarks/triton/utils/model_configs.json (removing MoE parameters from llama models, changing top_k parameter from 4 to 8 in deepseek-v3).
Add gpt-oss model family to Triton benchmarking model config (parameters are determined from the model card).
Modify benchmark outputs to match table schema in op_tests/op_benchmarks/triton/bench_schema.yaml.

Testing

Manuel tests were performed on both MI300 and MI350, across both shape and model benchmarks. Please see these metrics for fixed shape parameters, comparing the Triton kernels and PyTorch. Below are some sample outputs, from running triton/bench_gemm_a8w8_blockscale.py with llama3-8B.

Before:

After:

Triton vs Torch:

Below is a sample run of op_tests/op_benchmarks/triton/bench_moe.py, using the new gptoss model family.

Compared with deepseek-V3:

Attempting to benchmark MoE with the llama3 family of models now accurately throws an error.

dhonnappa-amd · 2025-09-25T15:30:51Z

Jenkins CI skipped: Check lint failed. Exiting the entire job...

eky-amd added 21 commits September 17, 2025 21:39

better table printouts for gemm

5227741

adapted other gemm benchmarks

ad4f5d5

adapted ff_a16w16_fused benchmark

85731fc

another bench_ff_a16w16_fused adjustment

46d7f3c

show units for bench_topk

d07a27e

resolved Python module issues

80fac9c

standardized benchmark table labels

3483a5a

fixed import bug in bench_pa_prefill

34a582b

merge main

ca134a1

line_vals layer tweaks for gemm kernels

fc8b0cb

modified confusing provider line_arg parameters in Triton benchmarks

aa837e7

initial torch comparison for gemm_a16w16

22280e6

continued adding torch comps to gemm benchmarks

d6c0fba

added torch comp to other gemm benchmarks

280b36d

Merge branch 'main' into eky/triton-bench-improvements

cd29fb0

fix import issue in bench_gemm_a8wfp4

4b51b7c

another fix for bench_gemm_a8wfp4

586299a

fixed more arch_info import issues

870845d

fix get_splitk import issue

45dc059

linting

9bfb033

black reformatting

65180f8

eky-amd requested a review from vgokhale September 23, 2025 17:06

eky-amd marked this pull request as draft September 23, 2025 17:07

eky-amd added 6 commits September 23, 2025 21:05

resolved torch timing bug in bench_gemm_a16w16

c50076d

Merge branch 'main' into eky/triton-bench-improvements

82c5c53

fixed topk parameter for deepseek-v3

32da42e

added gpt-oss model family for bench_moe

ce570bc

removed MoE parameters from llama config

03b4b9a

TFLOPS label fix for gemm benchmarks

0bc7c5e

eky-amd added 5 commits September 25, 2025 16:30

fixed schema discrepancies for non-gemm kernels

d4d90dc

black reformatting

8c3a825

Merge branch 'main' into eky/triton-bench-improvements

6a2a5ef

resolve merge conflicts

a7da35c

completing merge

bd68541

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TRITON] Benchmarking improvements #1063

[TRITON] Benchmarking improvements #1063

Uh oh!

eky-amd commented Sep 23, 2025 •

edited

Loading

Uh oh!

dhonnappa-amd commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[TRITON] Benchmarking improvements #1063

Are you sure you want to change the base?

[TRITON] Benchmarking improvements #1063

Uh oh!

Conversation

eky-amd commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Testing

Uh oh!

dhonnappa-amd commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eky-amd commented Sep 23, 2025 •

edited

Loading