Skip to content

Conversation

@ader47
Copy link
Contributor

@ader47 ader47 commented Dec 31, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a performance optimization by pre-calculating local_total_toks during the metadata build process, which avoids redundant calculations in _compute_prefill_context. The changes are logical and correctly implemented. I've added suggestions for a further minor optimization to avoid a redundant sum() operation during the initial calculation of local_total_toks.

dcp_rank]
actual_seq_lengths_kv = torch.cumsum(
local_chunked_kv_lens_rank, dim=0).tolist()
local_total_toks = local_chunked_kv_lens_rank.sum()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To avoid a redundant sum() operation, you can obtain local_total_toks from actual_seq_lengths_kv, which is calculated just before. The total sum is the last element of the cumulative sum list. This change will make local_total_toks an integer. Note that a corresponding change is needed at the call site to remove the .item() call.

Suggested change
local_total_toks = local_chunked_kv_lens_rank.sum()
local_total_toks = actual_seq_lengths_kv[-1] if actual_seq_lengths_kv else 0

batch_chunk_seq_mask=batch_chunk_seq_mask,
chunk_seq_mask_filtered_indices=chunk_seq_mask_filtered_indices
chunk_seq_mask_filtered_indices=chunk_seq_mask_filtered_indices,
local_total_toks=local_total_toks.item()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Following the suggestion on line 155, local_total_toks is now an integer, so the call to .item() is no longer necessary and would cause an error.

Suggested change
local_total_toks=local_total_toks.item()
local_total_toks=local_total_toks

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions
Copy link

github-actions bot commented Jan 4, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant