evalscope perf使用transformer性能低且存在报错Connection reset by peer #304

yucongshub · 2025-02-11T08:09:35Z

问题描述 / Issue Description

evalscope perf使用transformer做推理性能测试，在1 gpu和2gpu情况下使用率和显存都不高，且在4并发下出现报错Connection reset by peer

使用的工具 / Tools Used

执行的代码或指令 / Code or Commands Executed

evalscope perf  --parallel 4  --model ~/autodl-tmp/DeepSeek-R1-Distill-Qwen-7B/  --attn-implementation flash_attention_2  --log-every-n-query 2  --connect-timeout 6000  --read-timeout 6000  --max-tokens 2048  --min-tokens 2048  --api local  --dataset speed_benchmark  --debug

错误日志 / Error Log

2025-02-11 15:00:54,457 - evalscope - benchmark.py - statistic_benchmark_metric_worker - 183 - INFO - {
  "Time taken for tests (s)": 212.588,
  "Number of concurrency": 4,
  "Total requests": 4,
  "Succeed requests": 4,
  "Failed requests": 0,
  "Throughput(average tokens/s)": 38.535,
  "Average QPS": 0.019,
  "Average latency (s)": 173.249,
  "Average time to first token (s)": 173.249,
  "Average time per output token (s)": 0.02595,
  "Average input tokens per request": 257.5,
  "Average output tokens per request": 2048.0,
  "Average package latency (s)": 173.249,
  "Average package per request": 1.0
}
INFO:     127.0.0.1:54520 - "POST /v1/completions HTTP/1.1" 200 OK
2025-02-11 15:01:45,201 - evalscope - http_client.py - post - 106 - ERROR - [Errno 104] Connection reset by peer
2025-02-11 15:01:45,201 - evalscope - benchmark.py - send_requests_worker - 112 - ERROR - Request: {'prompt': '熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵', 'model': '/root/autodl-tmp/DeepSeek-R1-Distill-Qwen-7B/', 'max_tokens': 2048, 'min_tokens': 2048, 'seed': 42} failed, state_code: None, data: [Errno 104] Connection reset by peer
2025-02-11 15:01:45,204 - evalscope - http_client.py - on_response_chunk_received - 135 - DEBUG - Request received: <method='POST',  url=URL('http://127.0.0.1:8877/v1/completions'), truncated_chunk='{"model":"","object":"text_completion","created":1739257305,"choices":[{"index":0,"text":",\\n\\nGiven G\\ns A D F E G J R\\n\\n\\\\end{cases} is a \\\\( \\\\boxed{\\\\text{G}} \\\\)\\n\\n**Solution**:\\n\\nLet me consi...finition:** The partition function \\\\( p(n) \\\\) can be defined recursively as:\\n   \\\\[\\n   p(n) =","finish_reason":"stop"}],"usage":{"prompt_tokens":1025,"completion_tokens":2048,"total_tokens":3073}}'>
2025-02-11 15:01:45,206 - evalscope - http_client.py - post - 106 - ERROR - [Errno 104] Connection reset by peer
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
2025-02-11 15:01:45,206 - evalscope - benchmark.py - send_requests_worker - 112 - ERROR - Request: {'prompt': '熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵熵', 'model': '/root/autodl-tmp/DeepSeek-R1-Distill-Qwen-7B/', 'max_tokens': 2048, 'min_tokens': 2048, 'seed': 42} failed, state_code: None, data: [Errno 104] Connection reset by peer
2025-02-11 15:01:45,207 - evalscope - http_client.py - on_request_start - 111 - DEBUG - Starting request: <TraceRequestStartParams(method='POST', url=URL('http://127.0.0.1:8877/v1/completions'), headers=<CIMultiDict('Content-Type': 'application/json', 'user-agent': 'modelscope_bench', 'Authorization': 'Bearer EMPTY')>)>
Processing: 5it [04:23, 44.86s/it] 2025-02-11 15:01:45,212 - evalscope - http_client.py - on_request_start - 111 - DEBUG - Starting request: <TraceRequestStartParams(method='POST', url=URL('http://127.0.0.1:8877/v1/completions'), headers=<CIMultiDict('Content-Type': 'application/json', 'user-agent': 'modelscope_bench', 'Authorization': 'Bearer EMPTY')>)>
2025-02-11 15:01:45,212 - evalscope - http_client.py - on_request_start - 111 - DEBUG - Starting request: <TraceRequestStartParams(method='POST', url=URL('http://127.0.0.1:8877/v1/completions'), headers=<CIMultiDict('Content-Type': 'application/json', 'user-agent': 'modelscope_bench', 'Authorization': 'Bearer EMPTY')>)>

运行环境 / Runtime Environment

操作系统 / Operating System:
- Windows
- macOS
- Ubuntu
Python版本 / Python Version:
- 3.11
- 3.10
- 3.9

其他信息 / Additional Information

transformer推理测试过程中双gpu使用率如下：

NVIDIA GeForce RTX 4090 D, 42 %, 42 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 29 %, 27 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 40 %, 40 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 31 %, 30 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 43 %, 44 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 32 %, 30 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 44 %, 45 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 32 %, 30 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 44 %, 45 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 32 %, 30 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 42 %, 43 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 32 %, 30 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 42 %, 43 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 33 %, 31 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 45 %, 45 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 32 %, 30 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 45 %, 46 %, 24564 MiB

vllm推理测试过程中双gpu使用率如下：

NVIDIA GeForce RTX 4090 D, 90 %, 86 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 88 %, 85 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 90 %, 86 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 88 %, 85 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 90 %, 85 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 88 %, 85 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 90 %, 85 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 87 %, 84 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 89 %, 85 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 88 %, 85 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 89 %, 85 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 88 %, 85 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 90 %, 85 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 88 %, 85 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 89 %, 85 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 88 %, 85 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 90 %, 86 %, 24564 MiB
NVIDIA GeForce RTX 4090 D, 88 %, 86 %, 24564 MiB

The text was updated successfully, but these errors were encountered:

Yunnglin · 2025-02-11T09:31:17Z

evalscope使用的是什么版本？

可以尝试使用parallel为1进行speed_benchmark（这个speed_benchmark是用来测模型在单个请求情况下的速度的），如果要压测，可以这样写：

evalscope perf \
    --url "http://127.0.0.1:8000/v1/chat/completions" \
    --parallel 10 \
    --model qwen2.5 \
    --number 100 \
    --api openai \
    --dataset openqa \
    --stream

yucongshub · 2025-02-13T02:27:54Z

@y@Yunnglin 使用的版本

>>> evalscope.__version__
'0.10.1'

我这边使用的 speed_benchmark 做了并发测试主要是想模拟提示词比较长的场景下压力，目前来看的话，使用 vllm 测试都没有报错，是比较稳定的，猜测还是transformer性能比较低导致

Yunnglin · 2025-02-13T06:24:15Z

transformer推理是基于torch的，没有做性能优化。如果想测试长提示词的影响，可以自定义prompt来测试，参考

Yunnglin · 2025-02-26T10:33:39Z

感谢你的反馈！我们将关闭此问题。如果您有任何疑问，请随时重新打开它。如果EvalScope对您有所帮助，欢迎给我们点个STAR以示支持，谢谢！

Yunnglin closed this as completed Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evalscope perf使用transformer性能低且存在报错Connection reset by peer #304

evalscope perf使用transformer性能低且存在报错Connection reset by peer #304

yucongshub commented Feb 11, 2025 •

edited

Loading

Yunnglin commented Feb 11, 2025

yucongshub commented Feb 13, 2025

Yunnglin commented Feb 13, 2025

Yunnglin commented Feb 26, 2025

evalscope perf使用transformer性能低且存在报错Connection reset by peer #304

evalscope perf使用transformer性能低且存在报错Connection reset by peer #304

Comments

yucongshub commented Feb 11, 2025 • edited Loading

问题描述 / Issue Description

使用的工具 / Tools Used

执行的代码或指令 / Code or Commands Executed

错误日志 / Error Log

运行环境 / Runtime Environment

其他信息 / Additional Information

Yunnglin commented Feb 11, 2025

yucongshub commented Feb 13, 2025

Yunnglin commented Feb 13, 2025

Yunnglin commented Feb 26, 2025

yucongshub commented Feb 11, 2025 •

edited

Loading