vllm 0.11 and A770

Since Intel has so far abandoned ipex-llm and Arc cards...

**vllm v0.11.1rc2.dev221+g49c00fe30 works together with A770 (4x)**

<img width="873" height="58" alt="Image" src="https://github.com/user-attachments/assets/55e3ee66-a81c-4e30-97eb-a31dd08ea651" />

You can build a Docker container from the vllm repository sources (Dockerfile.xpu)
https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.xpu
```
docker build -f docker/Dockerfile.xpu -t vllm-xpu-0110 --shm-size=32g .
```

But I do not know how to properly configure it for the 4x A770, and I am sure that the performance could be higher 
**2 req/s -> 10+ req/s**.

<img width="271" height="53" alt="Image" src="https://github.com/user-attachments/assets/11fc9933-7d3a-4c5a-a0ed-6a13e62f8e1f" />

 **Llama3.1 8b Instruct FP8**
Sometimes the request processing speed reaches 12 requests/s, but there are problems with the process "hanging up" and then speeding up. I haven't figured out the reason yet. 
1024 in, 512 out for configuration 



```
--max-model-len "2000" 
--max-num-batched-tokens "3000"
```

### test

```
vllm bench serve \
    --model /llm/models/LLM-Research/Meta-Llama-3.1-8B-Instruct \
    --served-model-name Meta-Llama-3.1-8B-Instruct \
    --dataset-name random \
    --random-input-len 1024 \
    --random-output-len 512 \
    --ignore-eos \
    --num-prompt 1500 \
    --trust-remote-code \
    --request-rate inf \
    --backend vllm \
    --port 8000
```

Ubuntu 25.10, 6,17.3 kernel
my numbers for 4x A770, 2x Xeon 2699 V3 is:



### 115 requests

<img width="418" height="424" alt="Image" src="https://github.com/user-attachments/assets/d27713b7-ef7a-4825-858e-8261c4a06945" />

### 1500 requests

<img width="478" height="432" alt="Image" src="https://github.com/user-attachments/assets/1f09d571-d7d4-43df-becc-afe5ae844bd5" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vllm 0.11 and A770 #13323

test

115 requests

1500 requests

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vllm 0.11 and A770 #13323

Description

test

115 requests

1500 requests

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions