Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] vllm 跑openbmb/MiniCPM-o-2_6-int4 #765

Open
2 tasks done
yanghp86 opened this issue Jan 20, 2025 · 4 comments
Open
2 tasks done

[BUG] vllm 跑openbmb/MiniCPM-o-2_6-int4 #765

yanghp86 opened this issue Jan 20, 2025 · 4 comments

Comments

@yanghp86
Copy link

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

我用vllm 跑openbmb/MiniCPM-o-2_6-int4,代码参考的官方的DEMO部分,卡用的是RTX 2080TI,为什么会报代码片段和出错片段如下,代码片段只是改了模型名字如下:
from transformers import AutoTokenizer
from PIL import Image
from vllm import LLM, SamplingParams

MODEL_NAME = "openbmb/MiniCPM-o-2_6-int4"

MODEL_NAME = "openbmb/MiniCPM-O-2_6"

Also available for previous models

MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5"

MODEL_NAME = "HwwwH/MiniCPM-V-2"

image = Image.open("xxx.png").convert("RGB")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
llm = LLM(
model=MODEL_NAME,
trust_remote_code=True,
gpu_memory_utilization=1,
max_model_len=2048,
dtype='half'

)

messages = [{
"role":
"user",
"content":
# Number of images
"(./)" +
"\nWhat is the content of this image?"
}]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)

Single Inference

inputs = {
"prompt": prompt,
"multi_modal_data": {
"image": image
# Multi images, the number of images should be equal to that of (<image>./</image>)
# "image": [image, image]
},
}

Batch Inference

inputs = [{。。。。。其余不变

出错片段:
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/openbmb-vllm/./tests.py", line 13, in
[rank0]: llm = LLM(
[rank0]: File "/home/openbmb-vllm/vllm/utils.py", line 1038, in inner
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/home/openbmb-vllm/vllm/entrypoints/llm.py", line 228, in init
[rank0]: self.llm_engine = self.engine_class.from_engine_args(
[rank0]: File "/home/openbmb-vllm/vllm/engine/llm_engine.py", line 477, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/home/openbmb-vllm/vllm/engine/llm_engine.py", line 271, in init
[rank0]: self.model_executor = executor_class(vllm_config=vllm_config, )
[rank0]: File "/home/openbmb-vllm/vllm/executor/executor_base.py", line 42, in init
[rank0]: self._init_executor()
[rank0]: File "/home/openbmb-vllm/vllm/executor/uniproc_executor.py", line 34, in _init_executor
[rank0]: self.collective_rpc("load_model")
[rank0]: File "/home/openbmb-vllm/vllm/executor/uniproc_executor.py", line 48, in collective_rpc
[rank0]: answer = func(*args, **kwargs)
[rank0]: File "/home/openbmb-vllm/vllm/worker/worker.py", line 155, in load_model
[rank0]: self.model_runner.load_model()
[rank0]: File "/home/openbmb-vllm/vllm/worker/model_runner.py", line 1099, in load_model
[rank0]: self.model = get_model(vllm_config=self.vllm_config)
[rank0]: File "/home/openbmb-vllm/vllm/model_executor/model_loader/init.py", line 12, in get_model
[rank0]: return loader.load_model(vllm_config=vllm_config)
[rank0]: File "/home/openbmb-vllm/vllm/model_executor/model_loader/loader.py", line 368, in load_model
[rank0]: loaded_weights = model.load_weights(
[rank0]: File "/home/openbmb-vllm/vllm/model_executor/models/minicpmv.py", line 597, in load_weights
[rank0]: return loader.load_weights(weights)
[rank0]: File "/home/openbmb-vllm/vllm/model_executor/models/utils.py", line 233, in load_weights
[rank0]: autoloaded_weights = set(self._load_module("", self.module, weights))
[rank0]: File "/home/openbmb-vllm/vllm/model_executor/models/utils.py", line 194, in _load_module
[rank0]: yield from self._load_module(prefix,
[rank0]: File "/home/openbmb-vllm/vllm/model_executor/models/utils.py", line 194, in _load_module
[rank0]: yield from self._load_module(prefix,
[rank0]: File "/home/openbmb-vllm/vllm/model_executor/models/utils.py", line 222, in _load_module
[rank0]: raise ValueError(msg)
[rank0]: ValueError: There is no module or parameter named 'resampler.kv_proj.weight' in MiniCPMV2_6
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:33<?, ?it/s]

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

@yanghp86
Copy link
Author

报这个错[rank0]: ValueError: There is no module or parameter named 'resampler.kv_proj.weight' in MiniCPMV2_6

@iceflame89 iceflame89 changed the title [BUG] <title> [BUG] vllm 跑openbmb/MiniCPM-o-2_6-int4 Jan 21, 2025
@Serious-H
Copy link

请问解决了吗,我vllm离线推理openbmb/MiniCPM-O-2_6-int4,也是报这个错误,

ValueError: There is no module or parameter named 'resampler.kv_proj.weight' in MiniCPMV2_6

我环境如下:
OS: ubuntu
python:3.10
transformers:4.48.1
vllm:0.1.dev41+gee8.precompiled
torch:2.5.1
torchvision:0.20.1
torchaudio:2.5.1

@HwwwwwwwH
Copy link
Contributor

之前MiniCPM-o-2_6还没有完全支持 vllm,我们fork了一个仓库做的支持。现在我们的代码已经合进了官方的仓库中,如果你可以把官方的仓库拉下来从源代码构建,或者等待vllm官方下一个发布的wheel包。

@federico-fedi
Copy link

federico-fedi commented Feb 3, 2025

Hello, I just downloaded and installed the latest vllm wheel that should support MiniCPM-o-2_6 but using openbmb/MiniCPM-o-2_6-int4 I still receive the error:

There is no module or parameter named 'resampler.kv_proj.weight' in MiniCPMV2_6

Is it quantized model supported by vllm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@Serious-H @HwwwwwwwH @yanghp86 @federico-fedi and others