Skip to content

Conversation

@SlightwindSec
Copy link
Contributor

@SlightwindSec SlightwindSec commented Dec 31, 2025

This PR introduces support for MXFP8 (Microscaling Formats) data types for inference on the Ascend NPU (A5).

This PR is a continuation of the work originally started in PR #5113. Since that PR was closed, I have ported and adapted the codebase to the current master branch to ensure this feature reaches the community.

Huge thanks to @wangyao-i for the initial implementation and groundwork.

Signed-off-by: SlightwindSec <[email protected]>
Signed-off-by: SlightwindSec <[email protected]>
Signed-off-by: SlightwindSec <[email protected]>
@SlightwindSec SlightwindSec marked this pull request as draft December 31, 2025 03:23
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for MXFP8 data types for inference on Ascend A5 NPUs. The changes are extensive, touching attention mechanisms, MoE communication, quantization configurations, and adding new quantization methods and parsers. Overall, the changes are well-structured to accommodate the new hardware capabilities.

I've identified a critical bug in the MoE token dispatcher that could lead to an UnboundLocalError, and a couple of high-severity issues related to code style and potential future bugs. Please address these points to ensure the stability and maintainability of the new features.

Signed-off-by: SlightwindSec <[email protected]>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link

github-actions bot commented Jan 5, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: SlightwindSec <[email protected]>
@github-actions
Copy link

github-actions bot commented Jan 6, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant