-
Notifications
You must be signed in to change notification settings - Fork 789
[Refactor] Add expert processed token count output for DispatchFFNCombine/DispatchFFNCombineBF16 #6402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Refactor] Add expert processed token count output for DispatchFFNCombine/DispatchFFNCombineBF16 #6402
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new output, expert_token_nums, to the DispatchFFNCombine operator and its BF16 variant. The changes correctly propagate this new output through the operator's definition, host-side functions, and kernel implementations. However, there are a few issues to address. Firstly, two header files contain redundant inclusions, which should be cleaned up for better code quality. More importantly, adding a new required output is a breaking API change. It appears that downstream consumers, such as the Python bindings and tests, have not been updated to reflect this change, which will likely cause them to fail.
csrc/dispatch_ffn_combine_bf16/op_host/dispatch_ffn_combine_bf16_def.cpp
Show resolved
Hide resolved
Signed-off-by: guanguan0308 <[email protected]>
Signed-off-by: guanguan0308 <[email protected]>
Signed-off-by: guanguan0308 <[email protected]>
3253260 to
7598bb8
Compare
Signed-off-by: guanguan0308 <[email protected]>
3e182c3 to
261f697
Compare
Signed-off-by: guanguan0308 <[email protected]>
Signed-off-by: guanguan0308 <[email protected]>
Signed-off-by: guanguan0308 <[email protected]>
Signed-off-by: guanguan0308 <[email protected]>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: guanguan0308 <[email protected]>
Signed-off-by: guanguan0308 <[email protected]>
Signed-off-by: guanguan0308 <[email protected]>
Signed-off-by: guanguan0308 <[email protected]>
Signed-off-by: guanguan0308 <[email protected]>
What this PR does / why we need it?
Add New Output for Expert Token Count
An additional output tensor expert_token_nums is added to both operators to meet the requirement of tracking token distribution among experts:
Tensor Name: expert_token_nums
Dimension: 1D tensor
Shape: (local_expert_num,)
Data Type: int32
Semantics: Represents the number of tokens actually received by each expert on the current card.
Does this PR introduce any user-facing change?
How was this patch tested?