Skip to content

Conversation

@eky-amd
Copy link

@eky-amd eky-amd commented Sep 23, 2025

Changes

  • When running with a given model, all benchmark scripts now always output the metric evaluated. Previously, table columns would only show "fc1" and/or "fc2", which is not helpful when reading from the terminal.
  • Standardize table/graph labelling across Triton benchmarking scripts (standard formatting: <Provider>_<Layer>_<Metric>_(<Unit>)). Provider describes whether the bench was performed using the Triton kernel or Torch functions. Layer is either fc1 or fc2 for model benchmarks. Both are optional. Metric is either time (ms), throughput (TFLOPS), or bandwidth (GB/s).
  • Resolve arch_info and get_splitk import issues in triton/bench_batched_gemm_afp4wfp4.py, triton/bench_batched_gemm_afp4wfp4_pre_quant.py and triton/bench_gemm_a8wfp4.py.
  • Resolve STR_DTYPE_TO_TORCH_DTYPE import issue in triton/bench_pa_prefill.py.
  • Implement comparisons with PyTorch for the Triton GEMM kernels, using the -bench_torch flag.
  • Fix default parameters in op_tests/op_benchmarks/triton/utils/model_configs.json (removing MoE parameters from llama models, changing top_k parameter from 4 to 8 in deepseek-v3).
  • Add gpt-oss model family to Triton benchmarking model config (parameters are determined from the model card).
  • Modify benchmark outputs to match table schema in op_tests/op_benchmarks/triton/bench_schema.yaml.

Testing

Manuel tests were performed on both MI300 and MI350, across both shape and model benchmarks. Please see these metrics for fixed shape parameters, comparing the Triton kernels and PyTorch. Below are some sample outputs, from running triton/bench_gemm_a8w8_blockscale.py with llama3-8B.

Before:
Before
After:
After

Triton vs Torch:
Bench Example

Below is a sample run of op_tests/op_benchmarks/triton/bench_moe.py, using the new gptoss model family.
image
Compared with deepseek-V3:
image

Attempting to benchmark MoE with the llama3 family of models now accurately throws an error.
image

@eky-amd eky-amd requested a review from vgokhale September 23, 2025 17:06
@eky-amd eky-amd marked this pull request as draft September 23, 2025 17:07
@dhonnappa-amd
Copy link
Collaborator

Jenkins CI skipped: Check lint failed. Exiting the entire job...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants