Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用 模型推理压测工具 压测API服务报错(乱码) #301

Open
9 tasks
toneewang opened this issue Feb 9, 2025 · 7 comments
Open
9 tasks
Assignees
Labels

Comments

@toneewang
Copy link

toneewang commented Feb 9, 2025

问题描述 / Issue Description

请简要描述您遇到的问题。 / Please briefly describe the issue you encountered.

使用的工具 / Tools Used

  • Native / 原生框架
  • Opencompass backend
  • VLMEvalKit backend
  • RAGEval backend
  • [✔] Perf / 模型推理压测工具
  • Arena /竞技场模式

执行的代码或指令 / Code or Commands Executed

task_cfg = {"url": "http://127.0.0.1/v1/chat/completions",
"parallel": 1,
"model": "Qwen2.5-72B",
"number": 15,
"api": "openai",
"dataset": "speed_benchmark"}
run_perf_benchmark(task_cfg)

错误日志 / Error Log

Processing: 0it [00:00, ?it/s]2025-02-09 11:32:41,721 - evalscope - ERROR - Request: {'prompt': '熵', 'model': 'Qwen2.5-72B', 'max_tokens': 2048, 'seed': 42, 'stop': [], 'stop_token_ids': []} failed, state_code: 422, data: {"error": "request param contains not messages or messages null", "error_type": "validation"}
2025-02-09 11:32:42,720 - evalscope - ERROR - Request: {'prompt': '熵', 'model': 'Qwen2.5-72B', 'max_tokens': 2048, 'seed': 42, 'stop': [], 'stop_token_ids': []} failed, state_code: 422, data: {"error": "request param contains not messages or messages null", "error_type": "validation"}
Processing: 2it [00:01, 1.96it/s]2025-02-09 11:32:43,723 - evalscope - ERROR - Request: {'prompt': '熵熵熵熵熵熵熵熵熵 熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵

运行环境 / Runtime Environment

  • 操作系统 / Operating System:

    • Windows
    • macOS
    • [✔] Ubuntu
  • Python版本 / Python Version:

    • 3.11
    • [✔] 3.10
    • 3.9

其他信息 / Additional Information

使用docker构建环境,docker-file:

FROM ubuntu:22.04

RUN sed -i 's|http://archive.ubuntu.com/ubuntu/|http://mirrors.tuna.tsinghua.edu.cn/ubuntu/|g' /etc/apt/sources.list

RUN apt-get update -y &&
apt-get install -y python3.10 python3-pip &&
apt-get clean all

RUN pip install --user -i https://pypi.tuna.tsinghua.edu.cn/simple evalscope
&& pip install --user -i https://pypi.tuna.tsinghua.edu.cn/simple "evalscope[perf]"
&& pip install --user -i https://pypi.tuna.tsinghua.edu.cn/simple gradio

RUN apt-get update -y &&
apt-get install -y --fix-broken unzip wget &&
apt-get clean all
RUN mkdir -p /root/nltk_data/tokenizers &&
wget -O /root/nltk_data/tokenizers/punkt_tab.zip https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/open_data/nltk_data/punkt_tab.zip &&
unzip /root/nltk_data/tokenizers/punkt_tab.zip -d /root/nltk_data/tokenizers

CMD ["/bin/sh"]

@Yunnglin
Copy link
Collaborator

Yunnglin commented Feb 9, 2025

参考教程

速度测试url需要使用/v1/completions端点,而不是/v1/chat/completions,避免chat template的额外处理对输入长度有影响。

@Yunnglin Yunnglin self-assigned this Feb 9, 2025
@Yunnglin Yunnglin added the perf label Feb 9, 2025
@toneewang
Copy link
Author

参考教程

速度测试url需要使用/v1/completions端点,而不是/v1/chat/completions,避免chat template的额外处理对输入长度有影响。

按照教程,修改为 /v1/completions 端点后,运行python脚本:
from evalscope.perf.main import run_perf_benchmark
task_cfg = {"url": "http://127.0.0.1/v1/completions",
"parallel": 1,
"model": "Qwen2.5-72B",
"number": 15,
"api": "openai",
"dataset": "speed_benchmark",
"debug": True }
run_perf_benchmark(task_cfg)

日志前期显示请求内容为“熵”并且有正常返回值,在请求了两次熵后,请求变为大量熵及�,查看了模型侧的日志,没有接收到body信息,具体日志如下:
2025-02-09 14:05:53,158 - evalscope - http_client.py - on_request_start - 111 - DEBUG - Starting request: <TraceRequestStartParams(method='POST', url=URL('http://127.0.0.1/v1/completions'), headers=<CIMultiDict('Content-Type': 'application/json', 'user-agent': 'modelscope_bench', 'Authorization': 'Bearer EMPTY')>)>
2025-02-09 14:05:53,165 - evalscope - http_client.py - on_request_chunk_sent - 123 - DEBUG - Request sent: <method='POST', url=URL('http://127.0.0.1/v1/completions'), truncated_chunk='{"prompt": "hello", "model": "Qwen2.5-72B"}'>
2025-02-09 14:05:54,792 - evalscope - http_client.py - on_response_chunk_received - 135 - DEBUG - Request received: <method='POST', url=URL('http://127.0.0.1/v1/completions'), truncated_chunk='{"choices":[{"finish_reason":"stop","text":"Hello! It's nice to meet you. How can I assist you today? Whether you have questions, need information, or just want to chat, feel free to let me know!","index":0}],"id":"endpoint_common_2610","object":"text_completion","model":"Qwen2.5-72B","created":1739109828,"usage":{"prompt_tokens":30,"completion_tokens":38,"total_tokens":68}}'>
2025-02-09 14:05:54,794 - evalscope - http_client.py - test_connection - 157 - INFO - Connection successful.
2025-02-09 14:05:54,795 - evalscope - db_util.py - get_result_db_path - 103 - INFO - Save the data base to: ./outputs/20250209_140553/Qwen2.5-72B/benchmark_data.db
Processing: 0it [00:00, ?it/s]2025-02-09 14:05:54,803 - evalscope - http_client.py - on_request_start - 111 - DEBUG - Starting request: <TraceRequestStartParams(method='POST', url=URL('http://127.0.0.1/v1/completions'), headers=<CIMultiDict('Content-Type': 'application/json', 'user-agent': 'modelscope_bench', 'Authorization': 'Bearer EMPTY')>)>
2025-02-09 14:05:54,812 - evalscope - http_client.py - on_request_chunk_sent - 123 - DEBUG - Request sent: <method='POST', url=URL('http://127.0.0.1/v1/completions'), truncated_chunk='{"prompt": "熵", "model": "Qwen2.5-72B", "max_tokens": 2048, "seed": 42, "stop": [], "stop_token_ids": []}'>
2025-02-09 14:06:15,557 - evalscope - http_client.py - on_response_chunk_received - 135 - DEBUG - Request received: <method='POST', url=URL('http://127.0.0.1/v1/completions'), truncated_chunk='{"usage":{"prompt_tokens":30,"completion_tokens":485,"total_tokens":515},"id":"endpoint_common_2611","object":"text_completion","model":"Qwen2.5-72B","created":1739109849,"choices":[{"finish_reason":"...用于解释分子结构的稳定性、溶解性等现象。\n\n5. 生物学中的熵:在生物学中,熵的概念被用来描述生物体内部及与环境之间的能量转换和物质流动的无序程度。例如,生态系统中的熵增原理可以解释物种多样性的维持机制。\n\n熵的概念虽然起源于物理学,但其应用范围非常广泛,几乎涵盖了所有涉及系统状态变化的科学领域。理解熵对于深入探讨自然界和社会现象中的许多问题都非常重要。","index":0}]}'>
2025-02-09 14:06:15,559 - evalscope - http_client.py - on_request_start - 111 - DEBUG - Starting request: <TraceRequestStartParams(method='POST', url=URL('http://127.0.0.1/v1/completions'), headers=<CIMultiDict('Content-Type': 'application/json', 'user-agent': 'modelscope_bench', 'Authorization': 'Bearer EMPTY')>)>
Processing: 1it [00:20, 20.76s/it]2025-02-09 14:06:15,563 - evalscope - http_client.py - on_request_chunk_sent - 123 - DEBUG - Request sent: <method='POST', url=URL('http://127.0.0.1/v1/completions'), truncated_chunk='{"prompt": "熵", "model": "Qwen2.5-72B", "max_tokens": 2048, "seed": 42, "stop": [], "stop_token_ids": []}'>
·······
2025-02-09 13:45:52,820 - evalscope - ERROR - Request: {'prompt': '熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵 熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵··(省略大量的熵、�)····熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵��熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵', 'model': 'Qwen2.5-72B', 'max_tokens': 2048, 'seed': 42, 'stop': [], 'stop_token_ids': []} failed, state_code: 422, data: {"error": "request param contains not messages or messages null", "error_type": "validation"}

尝试了在环境变量中添加export LANG=en_US.UTF-8,以及在python的开始增加# coding=utf-8,无效果

@Yunnglin
Copy link
Collaborator

Yunnglin commented Feb 9, 2025

这个不是乱码,因为prompt是非常多的“熵”这个字,模型的输出也可能是非常多的“熵”这个字,模型只要产生输出就可以计算推理速度了。

@toneewang
Copy link
Author

这个不是乱码,因为prompt是非常多的“熵”这个字,模型的输出也可能是非常多的“熵”这个字,模型只要产生输出就可以计算推理速度了。

明白,这一点充分理解了。目前遇到的问题是接收到的Body为空,从接收端看到的日志如下,从发送端看到的日志是发送了大量的“熵”字。

root@a6ce8af4ebbb:/tmp# tail -f ./completions.log
2025-02-10 02:33:04 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"43","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body={"prompt": "hello", "model": "Qwen2.5-72B"}
2025-02-10 02:33:06 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"107","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body={"prompt": "熵", "model": "Qwen2.5-72B", "max_tokens": 2048, "seed": 42, "stop": [], "stop_token_ids": []}
2025-02-10 02:33:27 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"107","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body={"prompt": "熵", "model": "Qwen2.5-72B", "max_tokens": 2048, "seed": 42, "stop": [], "stop_token_ids": []}
2025-02-10 02:33:46 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"18536","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body=
2025-02-10 02:33:46 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"18536","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body=
2025-02-10 02:33:47 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"43112","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body=
2025-02-10 02:33:48 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"43112","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body=
2025-02-10 02:33:49 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"92264","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body=
2025-02-10 02:33:51 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"92264","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body=

@Yunnglin
Copy link
Collaborator

请问,你是用什么部署的模型服务呢? 这个使用示例能运行吗?

@toneewang
Copy link
Author

请问,你是用什么部署的模型服务呢? 这个使用示例能运行吗?

我使用的是MindIE-Server,我是用这个使用示例可以正常运行的,其中 /v1/completions 与 /v1/chat/completions 的兼容层是自己实现的,因为MindIE-Server默认没有兼容 /v1/completions 。

@Yunnglin
Copy link
Collaborator

我这边没有复现这个问题🤦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants