Skip to content

Conversation

@lgeiger
Copy link
Contributor

@lgeiger lgeiger commented Nov 18, 2025

Purpose

This is a small followup from #28798 which simplifies the indexing logic. /cc @gcanlin @Isotr0py

For Qwen3VL #28798 slightly changed behaviour where get_cos_sin() now already returns a GPU tensor. This introduces synchronous CPU to GPU copies of pos_ids when using it to index. cc1f0c5 Fixes it by moving the indices onto the GPU in a non-blocking way.

Before:
Screenshot 2025-11-18 at 19 46 53
After:
Screenshot 2025-11-18 at 19 47 13

Test Plan

VLLM_WORKER_MULTIPROC_METHOD=spawn lm_eval --model vllm-vlm --model_args "pretrained=Qwen/Qwen3-VL-30B-A3B-Instruct-FP8,max_model_len=10000" --tasks chartqa --batch_size auto --apply_chat_template

Test Result

Before:

Tasks Version Filter n-shot Metric Value Stderr
chartqa 0 none 0 anywhere_accuracy 0.8752 ± 0.0066
none 0 exact_match 0.6448 ± 0.0096
none 0 relaxed_accuracy 0.8656 ± 0.0068

After:

Tasks Version Filter n-shot Metric Value Stderr
chartqa 0 none 0 anywhere_accuracy 0.8760 ± 0.0066
none 0 exact_match 0.6388 ± 0.0096
none 0 relaxed_accuracy 0.8656 ± 0.0068

@lgeiger lgeiger requested a review from sighingnow as a code owner November 18, 2025 20:50
@mergify mergify bot added the qwen Related to Qwen models label Nov 18, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies the rotary embedding indexing logic across several models, which improves code readability and maintainability. It also addresses a performance issue in qwen3_vl.py by moving pos_ids to the GPU asynchronously, preventing synchronous CPU-to-GPU copies.

However, the same performance optimization is missing in other models modified in this PR (glm4_1v.py, qwen2_5_vl.py, qwen2_vl.py, and qwen3_omni_moe_thinker.py), where pos_ids is still created on the CPU, leading to synchronous copies during indexing. I've added specific comments to apply the same fix to these models for consistency and performance improvement.

@Isotr0py Isotr0py enabled auto-merge (squash) November 19, 2025 03:04
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 19, 2025
@Isotr0py Isotr0py merged commit 3d4e7d3 into vllm-project:main Nov 19, 2025
51 checks passed
@lgeiger lgeiger deleted the qwen-cos-sin-indexing branch November 19, 2025 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants