ask for transformer version

我按照 readme 下载了环境，执行 screenspot_pro_evaluation.py ，但是会报错：
```
发生异常: TypeError
Unknown image model type: qwen2_5_vl_text
  File "/.../GUI_Spotlight/spotlight/tools_envs/multiturn_env.py", line 52, in step
    llm_responses = llm.chat(messages_to_step, sampling_params=sampling_params, use_tqdm=True) # type: ignore
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../GUI_Spotlight/spotlight/tools_envs/multiturn_env.py", line 137, in generate
    states = self.step(states, llm, custom_sp)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../GUI_Spotlight/screenspot_pro_evaluation.py", line 609, in main
    prompts=multimodal_inputs,

            llm=llm,

            sampling_params=sampling_params,

        )

        completions = env_result["all_messages"]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../GUI_Spotlight/screenspot_pro_evaluation.py", line 732, in <module>
    main()
TypeError: Unknown image model type: qwen2_5_vl_text
```

我的 vllm 版本是：0.8.5.post1
其他核心库的版本和 pyproject.toml 中的要求一致。
显卡是 RTX 4090。
我分别在 CUDA 11.8 和 12.4 两台服务器上运行，其得到的结果是一致的。


这好像是vllm对"qwen2_5_vl_text"不识别导致的，我通过搜索得到了 2 个解决方法
1. 将 Spolight 中的： "model_type": "qwen2_5_vl_text" 替换为 "model_type": "qwen2_5_vl"，它成功运行了，但是我也不知道推理出来的内容是否符合期望
2. 降级 transformer ，我默认安装了`pip install transformers==4.57.1`，这是发帖时的最新版本，我尝试回退到上一个版本`pip install transformers==4.56.2`，它也可以正常运行。

所以我怀疑是 transformer 版本导致的问题，所以想询问，如果真的是由于 transformers 版本引起的。我希望作者可以把 transformers 版本要求加上。


---
the next is english version
---



I followed the readme to download the environment and executed `screenspot_pro_evaluation.py`, but it reports an error:

```
Exception occurred: TypeError
Unknown image model type: qwen2_5_vl_text
  File "/.../GUI_Spotlight/spotlight/tools_envs/multiturn_env.py", line 52, in step
    llm_responses = llm.chat(messages_to_step, sampling_params=sampling_params, use_tqdm=True) # type: ignore
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../GUI_Spotlight/spotlight/tools_envs/multiturn_env.py", line 137, in generate
    states = self.step(states, llm, custom_sp)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../GUI_Spotlight/screenspot_pro_evaluation.py", line 609, in main
    prompts=multimodal_inputs,

            llm=llm,

            sampling_params=sampling_params,

        )

        completions = env_result["all_messages"]
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../GUI_Spotlight/screenspot_pro_evaluation.py", line 732, in <module>
    main()
TypeError: Unknown image model type: qwen2_5_vl_text
```

My vllm version is: 0.8.5.post1
The versions of other core libraries are consistent with the requirements in `pyproject.toml`.
The GPU is an RTX 4090.
I ran it on two servers with CUDA 11.8 and 12.4 respectively, and the results were the same.

This seems to be caused by vllm not recognizing "qwen2\_5\_vl\_text". I found two solutions by searching:

1.  Replace `"model_type": "qwen2_5_vl_text"` with `"model_type": "qwen2_5_vl"` in Spotlight. It ran successfully, but I don't know if the inference results are as expected.
2.  Downgrade transformers. I had installed `pip install transformers==4.57.1` by default, which was the latest version at the time of posting. I tried rolling back to the previous version, `pip install transformers==4.56.2`, and it also ran normally.

Therefore, I suspect this is an issue caused by the transformers version. I wanted to ask, if it is indeed caused by the transformers version, I hope the author can add the specific transformers version requirement.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ask for transformer version #2

the next is english version

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ask for transformer version #2

Description

the next is english version

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions