[MM][Core] Decouple ViT backend from LM backend #27061

ywang96 · 2025-10-17T00:04:57Z

Purpose

Multimdal encoder attention backend for a selection of models have been adopting the same backend as the language model backbone (via env var). This has exposed challenges and inflexibilities since some backends may work for the LM but not for the multimodal encoder.

This PR refactors the ViT backend selection for models that use get_vit_attn_backend so that users can specify --mm-encoder-attn-backend as an override.

Model-specific attention post-selection changes will stay as is (for example, using upstream FA whenever it's available).

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Roger Wang <[email protected]>

chatgpt-codex-connector

💡 Codex Review

vllm/vllm/config/model.py

Lines 316 to 341 in 6a588c0

    
               def compute_hash(self) -> str: 
        
                   """ 
        
                   WARNING: Whenever a new field is added to this config, 
        
                   ensure that it is included in the factors list if 
        
                   it affects the computation graph. 
        
                   Provide a hash that uniquely identifies all the configs 
        
                   that affect the structure of the computation 
        
                   graph from input ids/embeddings to the final hidden states, 
        
                   excluding anything before input ids/embeddings and after 
        
                   the final hidden states. 
        
                   """ 
        
                   factors: list[Any] = [] 
        
                   factors.append(self.model) 
        
                   factors.append(self.dtype) 
        
                   factors.append(self.quantization) 
        
                   factors.append(self.revision) 
        
                   factors.append(self.code_revision) 
        
                   factors.append(self.max_model_len) 
        
                   factors.append(self.max_logprobs) 
        
                   factors.append(self.disable_sliding_window) 
        
                   factors.append(self.trust_remote_code) 
        
                   factors.append(self.generation_config) 
        
                   factors.append(self.model_impl) 
        
                   factors.append(self.override_generation_config) 
        
                   factors.append(self.rope_scaling)

Include mm encoder backend in ModelConfig hash

The new mm_encoder_attn_backend affects which attention kernel the vision encoder uses, but ModelConfig.compute_hash() still ignores the entire multimodal configuration. VllmConfig.compute_hash() relies on this value when keying compilation caches. As a result, running once with the default backend and later with --mm-encoder-attn-backend=FLASH_ATTN will produce identical hashes and may reuse compiled artifacts for the wrong backend. Please incorporate self.multimodal_config.compute_hash() (when present) into the model hash so that changing the ViT backend invalidates cached compilations.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: Roger Wang <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-10-20T04:02:41Z

vllm/model_executor/models/qwen2_vl.py

        self.attn_backend = get_vit_attn_backend(
-            head_size=head_dim, dtype=torch.get_default_dtype()
+            head_size=head_dim,
+            dtype=torch.get_default_dtype(),
+            attn_backend_override=attn_backend_override,
        )
        if self.attn_backend != _Backend.FLASH_ATTN and check_upstream_fa_availability(
            torch.get_default_dtype()


Override still forced to upstream flash attention

The newly added mm_encoder_attn_backend override is returned by get_vit_attn_backend, but the following block still promotes any non‑FLASH backend to _Backend.FLASH_ATTN whenever upstream flash attention is available. This means a user who explicitly passes --mm-encoder-attn-backend=TRITON_ATTN (or any other backend) will still run with flash attention on systems where it is present, defeating the purpose of the override and keeping ViT and LM backends coupled. Consider skipping this auto‑upgrade when an explicit override was provided.

Useful? React with 👍 / 👎.

DarkLight1337 · 2025-10-20T04:05:04Z

vllm/attention/layer.py


        # Determine the attention backend
-        backend = get_vit_attn_backend(head_size=head_size, dtype=dtype)
+        attn_backend_override = None


Should we rename this layer to VisionAttention btw?

This layer will be renamed to MMEncoderAttention in #27147. But we can also rename it here.

Let's do that in the other PR then

Isotr0py · 2025-10-20T04:07:25Z

vllm/config/multimodal.py

+            if candidate is not None:
+                return candidate
+
+        valid_backends = ", ".join(sorted(BackendEnum.__members__.keys()))


Perhaps we can add a supported_vit_backend for Platform interface in a following PR to detect invalid backend for specific platform before initializing model.

Yea right now it'll just show all possible _Backend with someone of them get auto resolved inside their correspending platform.get_vit_attn_backend. For example

vllm/vllm/platforms/rocm.py

Lines 203 to 211 in 9fce7be

@classmethod

def get_vit_attn_backend(cls, head_size: int, dtype: torch.dtype) -> "_Backend":

from vllm.attention.backends.registry import _Backend

if envs.VLLM_ROCM_USE_AITER and envs.VLLM_ROCM_USE_AITER_MHA and on_gfx9():

return _Backend.ROCM_AITER_FA

if on_gfx9():

return _Backend.FLASH_ATTN

return _Backend.TORCH_SDPA

I think we can shrink this selection by just having a specific _MHA_Backend enum

Isotr0py · 2025-10-20T04:11:19Z

vllm/attention/layer.py


        # Determine the attention backend
-        backend = get_vit_attn_backend(head_size=head_size, dtype=dtype)
+        attn_backend_override = None


This layer will be renamed to MMEncoderAttention in #27147. But we can also rename it here.

add

120a937

Signed-off-by: Roger Wang <[email protected]>

ywang96 marked this pull request as ready for review October 20, 2025 01:11

ywang96 requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners October 20, 2025 01:11

Merge branch 'main' into vit-attn-arg

6a588c0

ywang96 removed request for ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao October 20, 2025 01:11

ywang96 marked this pull request as draft October 20, 2025 01:11

chatgpt-codex-connector bot reviewed Oct 20, 2025

View reviewed changes

update

8c5b40e

Signed-off-by: Roger Wang <[email protected]>

mergify bot added the qwen Related to Qwen models label Oct 20, 2025

ywang96 added 3 commits October 20, 2025 03:19

cleanup

474dd7f

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into vit-attn-arg

5e424d6

cleanup

74af03a

Signed-off-by: Roger Wang <[email protected]>

ywang96 marked this pull request as ready for review October 20, 2025 03:57

ywang96 requested review from LucasWilkinson and sighingnow as code owners October 20, 2025 03:57

ywang96 requested review from DarkLight1337 and Isotr0py and removed request for LucasWilkinson and sighingnow October 20, 2025 03:57

chatgpt-codex-connector bot reviewed Oct 20, 2025

View reviewed changes

DarkLight1337 reviewed Oct 20, 2025

View reviewed changes

Isotr0py approved these changes Oct 20, 2025

View reviewed changes

Merge branch 'main' into vit-attn-arg

cecc4ee

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MM][Core] Decouple ViT backend from LM backend #27061

[MM][Core] Decouple ViT backend from LM backend #27061

ywang96 commented Oct 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 20, 2025

Uh oh!

DarkLight1337 Oct 20, 2025

Uh oh!

Isotr0py Oct 20, 2025

Uh oh!

DarkLight1337 Oct 20, 2025

Uh oh!

Isotr0py Oct 20, 2025

Uh oh!

ywang96 Oct 20, 2025 •

edited

Loading

Uh oh!

Isotr0py Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	def compute_hash(self) -> str:
	"""
	WARNING: Whenever a new field is added to this config,
	ensure that it is included in the factors list if
	it affects the computation graph.

	Provide a hash that uniquely identifies all the configs
	that affect the structure of the computation
	graph from input ids/embeddings to the final hidden states,
	excluding anything before input ids/embeddings and after
	the final hidden states.
	"""
	factors: list[Any] = []
	factors.append(self.model)
	factors.append(self.dtype)
	factors.append(self.quantization)
	factors.append(self.revision)
	factors.append(self.code_revision)
	factors.append(self.max_model_len)
	factors.append(self.max_logprobs)
	factors.append(self.disable_sliding_window)
	factors.append(self.trust_remote_code)
	factors.append(self.generation_config)
	factors.append(self.model_impl)
	factors.append(self.override_generation_config)
	factors.append(self.rope_scaling)

	@classmethod
	def get_vit_attn_backend(cls, head_size: int, dtype: torch.dtype) -> "_Backend":
	from vllm.attention.backends.registry import _Backend

	if envs.VLLM_ROCM_USE_AITER and envs.VLLM_ROCM_USE_AITER_MHA and on_gfx9():
	return _Backend.ROCM_AITER_FA
	if on_gfx9():
	return _Backend.FLASH_ATTN
	return _Backend.TORCH_SDPA

Uh oh!

[MM][Core] Decouple ViT backend from LM backend #27061

Are you sure you want to change the base?

[MM][Core] Decouple ViT backend from LM backend #27061

Conversation

ywang96 commented Oct 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

ywang96 Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Isotr0py Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ywang96 commented Oct 17, 2025 •

edited by github-actions bot

Loading

ywang96 Oct 20, 2025 •

edited

Loading