Skip to content

Conversation

@LICO1314
Copy link
Contributor

@LICO1314 LICO1314 commented Jan 30, 2026

What this PR does / why we need it?

  • Align actual_seq_lengths_q with padded query token count when running full-graph FIA updates to satisfy TND layout requirements.
  • Move the alignment helper into vllm_ascend/attention/utils.py so future FIA-based update paths can reuse it.
  • This prevents queryT != actualSequenceLengthQ[-1] mismatches that lead to aclnnFusedInferAttentionScoreV3 tiling failures under full-graph mode.
  • Fixes # (optional: link CI failure or issue if you have one)

Does this PR introduce any user-facing change?

No. This is a bug fix in full-graph parameter update alignment and does not change public APIs.

How was this patch tested?

Not run locally. Reproduced/observed in CI (full-graph + speculative decode) and the fix is a targeted alignment change.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in the full-graph FIA path by ensuring actual_seq_lengths_q is correctly aligned with the padded query token count. The introduction of the align_actual_seq_lengths_q helper function is a good, reusable solution. My review focuses on the correctness and safety of the changes. I've identified a critical issue in full_graph_fia where an IndexError could occur with empty request batches, and I have provided a recommendation to fix it.

Comment on lines 535 to 536
if query.shape[0] > num_tokens:
num_tokens = int(query.shape[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The logic to determine num_tokens across lines 534-536 is not safe. attn_metadata.actual_seq_lengths_q can be an empty list if there are no requests in the batch, which will cause an IndexError when [-1] is accessed on line 534.

This logic can be rewritten to be both safe and more concise. Consider replacing lines 534-536 with the following:

        unpadded_num_tokens = attn_metadata.actual_seq_lengths_q[-1] if attn_metadata.actual_seq_lengths_q else 0
        num_tokens = max(unpadded_num_tokens, int(query.shape[0]))

Align actual_seq_lengths_q with padded query token count in full-graph FIA updates and share the helper in attention utils to avoid TND layout mismatches.

Signed-off-by: lico67373 <[email protected]>
@LICO1314 LICO1314 force-pushed the fix-align-fia-seqlen branch from db8c6f3 to ec7e834 Compare January 30, 2026 02:38
@jianzs jianzs added ready read for review ready-for-test start test by label for PR labels Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants