Skip to content

Conversation

@hongxiayang
Copy link
Contributor

Motivation

Fix the ds related fp4 gemm micro-benchmarking issues

3000, 2112, 7168
60000, 4096, 512
3000, 7168, 256
8, 2112, 7168

@Copilot Copilot AI review requested due to automatic review settings October 22, 2025 22:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses DeepSpeed FP4 GEMM micro-benchmarking issues by adding tuned configurations and updating the untuned configuration list. The changes ensure that all necessary matrix multiplication shapes have proper heuristics available.

Key changes:

  • Added one new tuned configuration entry for shape (1, 2112, 7168)
  • Added five previously tuned shapes to the untuned configuration list

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
aiter/configs/a4w4_blockscale_untuned_gemm.csv Added 5 matrix shapes including the new (1, 2112, 7168) and 4 previously tuned shapes
aiter/configs/a4w4_blockscale_tuned_gemm.csv Added tuned configuration for matrix shape (1, 2112, 7168) with kernel parameters

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@hongxiayang hongxiayang changed the title add a tuned config and insert entries in untuned config add a tuned fp4 gemm ds config and insert entries in untuned config Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant