[main][bugfix] Fix fullgraph padding bug in mtp eagle refactor #5692

lilinsiman · 2026-01-07T08:03:40Z

What this PR does / why we need it?

The condition for determining padding in the fullgraph overlay with MTP and PCP has been modified to accommodate corner cases where the shape capture size is manually specified.

Does this PR introduce any user-facing change?

no

How was this patch tested?

ut and tests

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@2f4e654

gemini-code-assist

Code Review

This pull request addresses a bug in the padding logic for full graph mode, specifically in scenarios involving MTP (Multi-Token Prediction) and PCP (Prefill Context Parallelism). The modification correctly constrains the padding condition by ensuring that the number of input tokens does not exceed the maximum size of a captured graph. By using min(max_decode_tokens, self.cudagraph_batch_sizes[-1]), the change prevents the padding logic from being erroneously applied to batches that are too large for graph replay and would fall back to eager execution. This is a solid bug fix that enhances the robustness of the full graph execution path, particularly for cases with manually configured graph capture sizes.

yiz-liu · 2026-01-07T08:17:14Z

vllm_ascend/worker/model_runner_v1.py

            max_decode_tokens = self.scheduler_config.max_num_seqs * self.uniform_decode_query_len
            if self.compilation_config.cudagraph_mode.decode_mode() == CUDAGraphMode.FULL and \
-                uniform_decode and self.uniform_decode_query_len <= num_input_tokens <= max_decode_tokens:
+                uniform_decode and self.uniform_decode_query_len <= num_input_tokens <= min(max_decode_tokens, self.cudagraph_batch_sizes[-1]):


I suggest

max_decode_tokens = min(self.scheduler_config.max_num_seqs * self.uniform_decode_query_len, self.cudagraph_batch_sizes[-1])

It's more resonable.

Signed-off-by: lilinsiman <[email protected]>

github-actions · 2026-01-07T09:17:07Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: lilinsiman <[email protected]>

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

yiz-liu requested changes Jan 7, 2026

View reviewed changes

[main][bugfix] Fix fullgraph padding bug in mtp eagle refactor

d2295f6

Signed-off-by: lilinsiman <[email protected]>

lilinsiman force-pushed the bugfix branch from 431f52f to d2295f6 Compare January 7, 2026 08:22

yiz-liu approved these changes Jan 7, 2026

View reviewed changes

yiz-liu added ready read for review ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Jan 8, 2026

lilinsiman force-pushed the bugfix branch 8 times, most recently from 5396049 to 103d8c4 Compare January 9, 2026 08:41

fix lint errors

7e2324a

Signed-off-by: lilinsiman <[email protected]>

lilinsiman force-pushed the bugfix branch from 103d8c4 to 7e2324a Compare January 9, 2026 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[main][bugfix] Fix fullgraph padding bug in mtp eagle refactor #5692

[main][bugfix] Fix fullgraph padding bug in mtp eagle refactor #5692

lilinsiman commented Jan 7, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

yiz-liu Jan 7, 2026

Uh oh!

lilinsiman Jan 7, 2026

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[main][bugfix] Fix fullgraph padding bug in mtp eagle refactor #5692

Are you sure you want to change the base?

[main][bugfix] Fix fullgraph padding bug in mtp eagle refactor #5692

Conversation

lilinsiman commented Jan 7, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

yiz-liu Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

lilinsiman Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lilinsiman commented Jan 7, 2026 •

edited by github-actions bot

Loading