[Bug] Fix Batch Invariant MLA test #28967

yewentao256 · 2025-11-18T22:47:46Z

Purpose

A pr based on #28832 (which should be landed first)

After Flashinfer's update, we found that batch invariant support of FlashinferMLA is broken, having issue flashinfer-ai/flashinfer#2107 here, we don't want to use a for loop which will be very slow, so just ban the FlashinferMLA for a while. Update: even if using a for loop, still may cause some diff in logprob.

There is a bug in test

(EngineCore_DP0 pid=1348542)   File "/home/wentao/vllm-source/vllm/model_executor/models/qwen2.py", line 339, in <lambda>
(EngineCore_DP0 pid=1348542)     lambda prefix: decoder_layer_type(
(EngineCore_DP0 pid=1348542)                    ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1348542)   File "/home/wentao/vllm-source/vllm/model_executor/models/qwen3.py", line 185, in __init__
(EngineCore_DP0 pid=1348542)     self.self_attn = Qwen3Attention(
(EngineCore_DP0 pid=1348542)                      ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1348542)   File "/home/wentao/vllm-source/vllm/model_executor/models/qwen3.py", line 120, in __init__
(EngineCore_DP0 pid=1348542)     self.attn = Attention(
(EngineCore_DP0 pid=1348542)                 ^^^^^^^^^^
(EngineCore_DP0 pid=1348542)   File "/home/wentao/vllm-source/vllm/attention/layer.py", line 287, in __init__
(EngineCore_DP0 pid=1348542)     self.attn_backend = get_attn_backend(
(EngineCore_DP0 pid=1348542)                         ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1348542)   File "/home/wentao/vllm-source/vllm/attention/selector.py", line 90, in get_attn_backend
(EngineCore_DP0 pid=1348542)     return _cached_get_attn_backend(
(EngineCore_DP0 pid=1348542)            ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1348542)   File "/home/wentao/vllm-source/vllm/attention/selector.py", line 168, in _cached_get_attn_backend
(EngineCore_DP0 pid=1348542)     attention_cls = current_platform.get_attn_backend_cls(
(EngineCore_DP0 pid=1348542)                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1348542)   File "/home/wentao/vllm-source/vllm/platforms/cuda.py", line 372, in get_attn_backend_cls
(EngineCore_DP0 pid=1348542)     raise ValueError(
(EngineCore_DP0 pid=1348542) ValueError: Selected backend AttentionBackendEnum.FLASH_ATTN_MLA is not valid for this configuration. Reason: ['head_size not supported', 'non-MLA not supported']

This PR fixes that as well.

Test

Now everything green

Signed-off-by: yewentao256 <[email protected]>

gemini-code-assist

Code Review

This pull request correctly addresses a bug in the batch invariant MLA test by disabling problematic backends and dynamically selecting a compatible model for MLA tests. The changes are logical and align with the PR's goal. My review includes one suggestion to improve the maintainability of the test code by refactoring duplicated logic.

tests/v1/determinism/test_batch_invariance.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/model_executor/layers/batch_invariant.py

Signed-off-by: yewentao256 <[email protected]>

yewentao256 added 2 commits November 18, 2025 22:14

fix MLA test

d123a29

Signed-off-by: yewentao256 <[email protected]>

fix unit test

506e84f

Signed-off-by: yewentao256 <[email protected]>

mergify bot added the v1 label Nov 18, 2025

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

tests/v1/determinism/test_batch_invariance.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 18, 2025

View reviewed changes

vllm/model_executor/layers/batch_invariant.py Show resolved Hide resolved

abstract

1db42ac

Signed-off-by: yewentao256 <[email protected]>

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug] Fix Batch Invariant MLA test #28967

[Bug] Fix Batch Invariant MLA test #28967

yewentao256 commented Nov 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Bug] Fix Batch Invariant MLA test #28967

Are you sure you want to change the base?

[Bug] Fix Batch Invariant MLA test #28967

Conversation

yewentao256 commented Nov 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yewentao256 commented Nov 18, 2025 •

edited by github-actions bot

Loading