-
Notifications
You must be signed in to change notification settings - Fork 716
[WIP][Feature] Support MXFP8 #5550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[WIP][Feature] Support MXFP8 #5550
Conversation
Signed-off-by: SlightwindSec <[email protected]>
Signed-off-by: SlightwindSec <[email protected]>
Signed-off-by: SlightwindSec <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for MXFP8 data types for inference on Ascend A5 NPUs. The changes are extensive, touching attention mechanisms, MoE communication, quantization configurations, and adding new quantization methods and parsers. Overall, the changes are well-structured to accommodate the new hardware capabilities.
I've identified a critical bug in the MoE token dispatcher that could lead to an UnboundLocalError, and a couple of high-severity issues related to code style and potential future bugs. Please address these points to ensure the stability and maintainability of the new features.
Signed-off-by: SlightwindSec <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: SlightwindSec <[email protected]>
Signed-off-by: SlightwindSec <[email protected]>
Signed-off-by: Cao Yi <[email protected]>
Signed-off-by: SlightwindSec <[email protected]>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: SlightwindSec <[email protected]>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
This PR introduces support for MXFP8 (Microscaling Formats) data types for inference on the Ascend NPU (A5).
This PR is a continuation of the work originally started in PR #5113. Since that PR was closed, I have ported and adapted the codebase to the current master branch to ensure this feature reaches the community.
Huge thanks to @wangyao-i for the initial implementation and groundwork.