Skip to content

Commit 990ebde

Browse files
committed
tests_gaudi: Update readme for vllm workload
Signed-off-by: vbedida79 <veenadhari.bedida@intel.com>
1 parent 0d142cb commit 990ebde

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

tests/gaudi/l2/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,7 @@ $ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-
104104
```
105105
meta-llama/Llama-3.1-8B model is used in this deployment and the hugging face token is used to access such gated models.
106106
* For the PV setup with NFS, refer to [documentation](https://docs.openshift.com/container-platform/4.17/storage/persistent_storage/persistent-storage-nfs.html).
107+
The vLLM workload needs access to the host's shared memory for tensor parallel inference, this is a volume mount in the deployment yaml.
107108
```
108109
$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/tests/gaudi/l2/vllm_deployment.yaml
109110
```
@@ -160,11 +161,10 @@ Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:10<00:03, 3.59s/i
160161
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:11<00:00, 2.49s/it]
161162
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:11<00:00, 2.93s/it]
162163
```
163-
Run inference requests using the service url.
164+
Use any pod in the namespace gaudi-validation with curl access, to run the following inference requests with the vllm service url.
164165
```
165166
sh-5.1# curl "http://vllm-workload.gaudi-validation.svc.cluster.local:8000/v1/models"{"object":"list","data":[{"id":"meta-llama/Llama-3.1-8B","object":"model","created":1730317412,"owned_by":"vllm","root":"meta-llama/Llama-3.1-8B","parent":null,"max_model_len":131072,"permission":[{"id":"modelperm-452b2bd990834aa5a9416d083fcc4c9e","object":"model_permission","created":1730317412,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}
166167
```
167-
168168
```
169169
sh-5.1# curl http://vllm-workload.gaudi-validation.svc.cluster.local:8000/v1/completions -H "Content-Type: application/json" -d '{
170170
"model": "meta-llama/Llama-3.1-8B",

0 commit comments

Comments
 (0)