-
Notifications
You must be signed in to change notification settings - Fork 0
Description
我按照 readme 下载了环境,执行 screenspot_pro_evaluation.py ,但是会报错:
发生异常: TypeError
Unknown image model type: qwen2_5_vl_text
File "/.../GUI_Spotlight/spotlight/tools_envs/multiturn_env.py", line 52, in step
llm_responses = llm.chat(messages_to_step, sampling_params=sampling_params, use_tqdm=True) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../GUI_Spotlight/spotlight/tools_envs/multiturn_env.py", line 137, in generate
states = self.step(states, llm, custom_sp)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../GUI_Spotlight/screenspot_pro_evaluation.py", line 609, in main
prompts=multimodal_inputs,
llm=llm,
sampling_params=sampling_params,
)
completions = env_result["all_messages"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../GUI_Spotlight/screenspot_pro_evaluation.py", line 732, in <module>
main()
TypeError: Unknown image model type: qwen2_5_vl_text
我的 vllm 版本是:0.8.5.post1
其他核心库的版本和 pyproject.toml 中的要求一致。
显卡是 RTX 4090。
我分别在 CUDA 11.8 和 12.4 两台服务器上运行,其得到的结果是一致的。
这好像是vllm对"qwen2_5_vl_text"不识别导致的,我通过搜索得到了 2 个解决方法
- 将 Spolight 中的: "model_type": "qwen2_5_vl_text" 替换为 "model_type": "qwen2_5_vl",它成功运行了,但是我也不知道推理出来的内容是否符合期望
- 降级 transformer ,我默认安装了
pip install transformers==4.57.1,这是发帖时的最新版本,我尝试回退到上一个版本pip install transformers==4.56.2,它也可以正常运行。
所以我怀疑是 transformer 版本导致的问题,所以想询问,如果真的是由于 transformers 版本引起的。我希望作者可以把 transformers 版本要求加上。
the next is english version
I followed the readme to download the environment and executed screenspot_pro_evaluation.py, but it reports an error:
Exception occurred: TypeError
Unknown image model type: qwen2_5_vl_text
File "/.../GUI_Spotlight/spotlight/tools_envs/multiturn_env.py", line 52, in step
llm_responses = llm.chat(messages_to_step, sampling_params=sampling_params, use_tqdm=True) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../GUI_Spotlight/spotlight/tools_envs/multiturn_env.py", line 137, in generate
states = self.step(states, llm, custom_sp)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../GUI_Spotlight/screenspot_pro_evaluation.py", line 609, in main
prompts=multimodal_inputs,
llm=llm,
sampling_params=sampling_params,
)
completions = env_result["all_messages"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../GUI_Spotlight/screenspot_pro_evaluation.py", line 732, in <module>
main()
TypeError: Unknown image model type: qwen2_5_vl_text
My vllm version is: 0.8.5.post1
The versions of other core libraries are consistent with the requirements in pyproject.toml.
The GPU is an RTX 4090.
I ran it on two servers with CUDA 11.8 and 12.4 respectively, and the results were the same.
This seems to be caused by vllm not recognizing "qwen2_5_vl_text". I found two solutions by searching:
- Replace
"model_type": "qwen2_5_vl_text"with"model_type": "qwen2_5_vl"in Spotlight. It ran successfully, but I don't know if the inference results are as expected. - Downgrade transformers. I had installed
pip install transformers==4.57.1by default, which was the latest version at the time of posting. I tried rolling back to the previous version,pip install transformers==4.56.2, and it also ran normally.
Therefore, I suspect this is an issue caused by the transformers version. I wanted to ask, if it is indeed caused by the transformers version, I hope the author can add the specific transformers version requirement.