Skip to content

Conversation

@guanguan0308
Copy link
Contributor

@guanguan0308 guanguan0308 commented Jan 30, 2026

What this PR does / why we need it?

Add New Output for Expert Token Count
An additional output tensor expert_token_nums is added to both operators to meet the requirement of tracking token distribution among experts:

Tensor Name: expert_token_nums
Dimension: 1D tensor
Shape: (local_expert_num,)
Data Type: int32
Semantics: Represents the number of tokens actually received by each expert on the current card.

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new output, expert_token_nums, to the DispatchFFNCombine operator and its BF16 variant. The changes correctly propagate this new output through the operator's definition, host-side functions, and kernel implementations. However, there are a few issues to address. Firstly, two header files contain redundant inclusions, which should be cleaned up for better code quality. More importantly, adding a new required output is a breaking API change. It appears that downstream consumers, such as the Python bindings and tests, have not been updated to reflect this change, which will likely cause them to fail.

Signed-off-by: guanguan0308 <[email protected]>
Signed-off-by: guanguan0308 <[email protected]>
@guanguan0308 guanguan0308 force-pushed the add_expert_token_nums branch from 3253260 to 7598bb8 Compare January 30, 2026 02:33
@guanguan0308 guanguan0308 changed the title Add expert token nums [Refactor] Extract common code for DispatchFFNCombine/DispatchFFNCombineBF16 and add expert processed token count output Jan 30, 2026
@guanguan0308 guanguan0308 changed the title [Refactor] Extract common code for DispatchFFNCombine/DispatchFFNCombineBF16 and add expert processed token count output [Refactor] Add expert processed token count output Jan 30, 2026
@guanguan0308 guanguan0308 changed the title [Refactor] Add expert processed token count output [Refactor] Add expert processed token count output for DispatchFFNCombine/DispatchFFNCombineBF16 Jan 30, 2026
Signed-off-by: guanguan0308 <[email protected]>
@guanguan0308 guanguan0308 force-pushed the add_expert_token_nums branch from 3e182c3 to 261f697 Compare January 30, 2026 06:46
Signed-off-by: guanguan0308 <[email protected]>
Signed-off-by: guanguan0308 <[email protected]>
Signed-off-by: guanguan0308 <[email protected]>
Signed-off-by: guanguan0308 <[email protected]>
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants