使用模型推理压测工具压测API服务报错（乱码） #301

toneewang · 2025-02-09T11:54:15Z

问题描述 / Issue Description

请简要描述您遇到的问题。 / Please briefly describe the issue you encountered.

使用的工具 / Tools Used

执行的代码或指令 / Code or Commands Executed

task_cfg = {"url": "http://127.0.0.1/v1/chat/completions",
"parallel": 1,
"model": "Qwen2.5-72B",
"number": 15,
"api": "openai",
"dataset": "speed_benchmark"}
run_perf_benchmark(task_cfg)

错误日志 / Error Log

Processing: 0it [00:00, ?it/s]2025-02-09 11:32:41,721 - evalscope - ERROR - Request: {'prompt': '熵', 'model': 'Qwen2.5-72B', 'max_tokens': 2048, 'seed': 42, 'stop': [], 'stop_token_ids': []} failed, state_code: 422, data: {"error": "request param contains not messages or messages null", "error_type": "validation"}
2025-02-09 11:32:42,720 - evalscope - ERROR - Request: {'prompt': '熵', 'model': 'Qwen2.5-72B', 'max_tokens': 2048, 'seed': 42, 'stop': [], 'stop_token_ids': []} failed, state_code: 422, data: {"error": "request param contains not messages or messages null", "error_type": "validation"}
Processing: 2it [00:01, 1.96it/s]2025-02-09 11:32:43,723 - evalscope - ERROR - Request: {'prompt': '熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵

运行环境 / Runtime Environment

操作系统 / Operating System:
- Windows
- macOS
- [✔] Ubuntu
Python版本 / Python Version:
- 3.11
- [✔] 3.10
- 3.9

其他信息 / Additional Information

使用docker构建环境，docker-file:

FROM ubuntu:22.04

RUN sed -i 's|http://archive.ubuntu.com/ubuntu/|http://mirrors.tuna.tsinghua.edu.cn/ubuntu/|g' /etc/apt/sources.list

RUN apt-get update -y &&
apt-get install -y python3.10 python3-pip &&
apt-get clean all

RUN pip install --user -i https://pypi.tuna.tsinghua.edu.cn/simple evalscope
&& pip install --user -i https://pypi.tuna.tsinghua.edu.cn/simple "evalscope[perf]"
&& pip install --user -i https://pypi.tuna.tsinghua.edu.cn/simple gradio

RUN apt-get update -y &&
apt-get install -y --fix-broken unzip wget &&
apt-get clean all
RUN mkdir -p /root/nltk_data/tokenizers &&
wget -O /root/nltk_data/tokenizers/punkt_tab.zip https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/open_data/nltk_data/punkt_tab.zip &&
unzip /root/nltk_data/tokenizers/punkt_tab.zip -d /root/nltk_data/tokenizers

CMD ["/bin/sh"]

Yunnglin · 2025-02-09T12:00:39Z

参考教程

速度测试url需要使用/v1/completions端点，而不是/v1/chat/completions，避免chat template的额外处理对输入长度有影响。

toneewang · 2025-02-09T14:11:05Z

参考教程

速度测试url需要使用/v1/completions端点，而不是/v1/chat/completions，避免chat template的额外处理对输入长度有影响。

按照教程，修改为 /v1/completions 端点后，运行python脚本：
from evalscope.perf.main import run_perf_benchmark
task_cfg = {"url": "http://127.0.0.1/v1/completions",
"parallel": 1,
"model": "Qwen2.5-72B",
"number": 15,
"api": "openai",
"dataset": "speed_benchmark",
"debug": True }
run_perf_benchmark(task_cfg)

日志前期显示请求内容为“熵”并且有正常返回值，在请求了两次熵后，请求变为大量熵及�，查看了模型侧的日志，没有接收到body信息，具体日志如下：
2025-02-09 14:05:53,158 - evalscope - http_client.py - on_request_start - 111 - DEBUG - Starting request: <TraceRequestStartParams(method='POST', url=URL('http://127.0.0.1/v1/completions'), headers=<CIMultiDict('Content-Type': 'application/json', 'user-agent': 'modelscope_bench', 'Authorization': 'Bearer EMPTY')>)>
2025-02-09 14:05:53,165 - evalscope - http_client.py - on_request_chunk_sent - 123 - DEBUG - Request sent: <method='POST', url=URL('http://127.0.0.1/v1/completions'), truncated_chunk='{"prompt": "hello", "model": "Qwen2.5-72B"}'>
2025-02-09 14:05:54,792 - evalscope - http_client.py - on_response_chunk_received - 135 - DEBUG - Request received: <method='POST', url=URL('http://127.0.0.1/v1/completions'), truncated_chunk='{"choices":[{"finish_reason":"stop","text":"Hello! It's nice to meet you. How can I assist you today? Whether you have questions, need information, or just want to chat, feel free to let me know!","index":0}],"id":"endpoint_common_2610","object":"text_completion","model":"Qwen2.5-72B","created":1739109828,"usage":{"prompt_tokens":30,"completion_tokens":38,"total_tokens":68}}'>
2025-02-09 14:05:54,794 - evalscope - http_client.py - test_connection - 157 - INFO - Connection successful.
2025-02-09 14:05:54,795 - evalscope - db_util.py - get_result_db_path - 103 - INFO - Save the data base to: ./outputs/20250209_140553/Qwen2.5-72B/benchmark_data.db
Processing: 0it [00:00, ?it/s]2025-02-09 14:05:54,803 - evalscope - http_client.py - on_request_start - 111 - DEBUG - Starting request: <TraceRequestStartParams(method='POST', url=URL('http://127.0.0.1/v1/completions'), headers=<CIMultiDict('Content-Type': 'application/json', 'user-agent': 'modelscope_bench', 'Authorization': 'Bearer EMPTY')>)>
2025-02-09 14:05:54,812 - evalscope - http_client.py - on_request_chunk_sent - 123 - DEBUG - Request sent: <method='POST', url=URL('http://127.0.0.1/v1/completions'), truncated_chunk='{"prompt": "熵", "model": "Qwen2.5-72B", "max_tokens": 2048, "seed": 42, "stop": [], "stop_token_ids": []}'>
2025-02-09 14:06:15,557 - evalscope - http_client.py - on_response_chunk_received - 135 - DEBUG - Request received: <method='POST', url=URL('http://127.0.0.1/v1/completions'), truncated_chunk='{"usage":{"prompt_tokens":30,"completion_tokens":485,"total_tokens":515},"id":"endpoint_common_2611","object":"text_completion","model":"Qwen2.5-72B","created":1739109849,"choices":[{"finish_reason":"...用于解释分子结构的稳定性、溶解性等现象。\n\n5. 生物学中的熵：在生物学中，熵的概念被用来描述生物体内部及与环境之间的能量转换和物质流动的无序程度。例如，生态系统中的熵增原理可以解释物种多样性的维持机制。\n\n熵的概念虽然起源于物理学，但其应用范围非常广泛，几乎涵盖了所有涉及系统状态变化的科学领域。理解熵对于深入探讨自然界和社会现象中的许多问题都非常重要。","index":0}]}'>
2025-02-09 14:06:15,559 - evalscope - http_client.py - on_request_start - 111 - DEBUG - Starting request: <TraceRequestStartParams(method='POST', url=URL('http://127.0.0.1/v1/completions'), headers=<CIMultiDict('Content-Type': 'application/json', 'user-agent': 'modelscope_bench', 'Authorization': 'Bearer EMPTY')>)>
Processing: 1it [00:20, 20.76s/it]2025-02-09 14:06:15,563 - evalscope - http_client.py - on_request_chunk_sent - 123 - DEBUG - Request sent: <method='POST', url=URL('http://127.0.0.1/v1/completions'), truncated_chunk='{"prompt": "熵", "model": "Qwen2.5-72B", "max_tokens": 2048, "seed": 42, "stop": [], "stop_token_ids": []}'>
·······
2025-02-09 13:45:52,820 - evalscope - ERROR - Request: {'prompt': '熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵··(省略大量的熵、�)····熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵��熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵', 'model': 'Qwen2.5-72B', 'max_tokens': 2048, 'seed': 42, 'stop': [], 'stop_token_ids': []} failed, state_code: 422, data: {"error": "request param contains not messages or messages null", "error_type": "validation"}

尝试了在环境变量中添加export LANG=en_US.UTF-8，以及在python的开始增加# coding=utf-8，无效果

Yunnglin · 2025-02-09T14:50:05Z

这个不是乱码，因为prompt是非常多的“熵”这个字，模型的输出也可能是非常多的“熵”这个字，模型只要产生输出就可以计算推理速度了。

toneewang · 2025-02-10T02:37:01Z

这个不是乱码，因为prompt是非常多的“熵”这个字，模型的输出也可能是非常多的“熵”这个字，模型只要产生输出就可以计算推理速度了。

明白，这一点充分理解了。目前遇到的问题是接收到的Body为空，从接收端看到的日志如下，从发送端看到的日志是发送了大量的“熵”字。

root@a6ce8af4ebbb:/tmp# tail -f ./completions.log
2025-02-10 02:33:04 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"43","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body={"prompt": "hello", "model": "Qwen2.5-72B"}
2025-02-10 02:33:06 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"107","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body={"prompt": "熵", "model": "Qwen2.5-72B", "max_tokens": 2048, "seed": 42, "stop": [], "stop_token_ids": []}
2025-02-10 02:33:27 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"107","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body={"prompt": "熵", "model": "Qwen2.5-72B", "max_tokens": 2048, "seed": 42, "stop": [], "stop_token_ids": []}
2025-02-10 02:33:46 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"18536","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body=
2025-02-10 02:33:46 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"18536","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body=
2025-02-10 02:33:47 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"43112","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body=
2025-02-10 02:33:48 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"43112","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body=
2025-02-10 02:33:49 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"92264","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body=
2025-02-10 02:33:51 Received request: URI=/v1/completions, Headers={"user-agent":"modelscope_bench","authorization":"Bearer EMPTY","content-length":"92264","host":"127.0.0.1","accept-encoding":"gzip, deflate","content-type":"application/json","accept":"/"}, Body=

Yunnglin · 2025-02-10T07:55:53Z

请问，你是用什么部署的模型服务呢? 这个使用示例能运行吗?

toneewang · 2025-02-10T15:51:53Z

请问，你是用什么部署的模型服务呢? 这个使用示例能运行吗?

我使用的是MindIE-Server，我是用这个使用示例可以正常运行的，其中 /v1/completions 与 /v1/chat/completions 的兼容层是自己实现的，因为MindIE-Server默认没有兼容 /v1/completions 。

Yunnglin · 2025-02-13T05:18:39Z

我这边没有复现这个问题🤦

Yunnglin self-assigned this Feb 9, 2025

Yunnglin added the perf label Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用模型推理压测工具压测API服务报错（乱码） #301

使用模型推理压测工具压测API服务报错（乱码） #301

toneewang commented Feb 9, 2025 •

edited

Loading

Yunnglin commented Feb 9, 2025

toneewang commented Feb 9, 2025

Yunnglin commented Feb 9, 2025

toneewang commented Feb 10, 2025

Yunnglin commented Feb 10, 2025

toneewang commented Feb 10, 2025

Yunnglin commented Feb 13, 2025

使用 模型推理压测工具 压测API服务报错（乱码） #301

使用 模型推理压测工具 压测API服务报错（乱码） #301

Comments

toneewang commented Feb 9, 2025 • edited Loading

问题描述 / Issue Description

使用的工具 / Tools Used

执行的代码或指令 / Code or Commands Executed

错误日志 / Error Log

运行环境 / Runtime Environment

其他信息 / Additional Information

Yunnglin commented Feb 9, 2025

toneewang commented Feb 9, 2025

Yunnglin commented Feb 9, 2025

toneewang commented Feb 10, 2025

Yunnglin commented Feb 10, 2025

toneewang commented Feb 10, 2025

Yunnglin commented Feb 13, 2025

使用模型推理压测工具压测API服务报错（乱码） #301

使用模型推理压测工具压测API服务报错（乱码） #301

toneewang commented Feb 9, 2025 •

edited

Loading