Skip to content

Commit 3d044db

Browse files
authored
add llama3.2-vision Pytorch example (#12165)
1 parent e2ef9e9 commit 3d044db

File tree

2 files changed

+211
-0
lines changed

2 files changed

+211
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# Llama3.2-Vision
2+
In this directory, you will find examples on how you could use IPEX-LLM `optimize_model` API to accelerate Llama3.2-Vision models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) as a reference Llama3.2-Vision model.
3+
4+
## 0. Requirements
5+
To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
6+
7+
## Example: Predict Tokens using `generate()` API
8+
In the example [generate.py](./generate.py), we show a basic use case for a Llama3.2-Vision model to predict the next N tokens using `generate()` API, with IPEX-LLM 'optimize_model' API on Intel GPUs.
9+
### 1. Install
10+
#### 1.1 Installation on Linux
11+
We suggest using conda to manage environment:
12+
```bash
13+
conda create -n llm python=3.11
14+
conda activate llm
15+
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
16+
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
17+
18+
pip install transformers==4.45.0
19+
```
20+
21+
#### 1.2 Installation on Windows
22+
We suggest using conda to manage environment:
23+
```bash
24+
conda create -n llm python=3.11 libuv
25+
conda activate llm
26+
27+
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
28+
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
29+
30+
pip install transformers==4.45.0
31+
```
32+
33+
### 2. Configures OneAPI environment variables for Linux
34+
35+
> [!NOTE]
36+
> Skip this step if you are running on Windows.
37+
38+
This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
39+
40+
```bash
41+
source /opt/intel/oneapi/setvars.sh
42+
```
43+
44+
### 3. Runtime Configurations
45+
For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
46+
#### 3.1 Configurations for Linux
47+
<details>
48+
49+
<summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
50+
51+
```bash
52+
export USE_XETLA=OFF
53+
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
54+
export SYCL_CACHE_PERSISTENT=1
55+
```
56+
57+
</details>
58+
59+
<details>
60+
61+
<summary>For Intel Data Center GPU Max Series</summary>
62+
63+
```bash
64+
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
65+
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
66+
export SYCL_CACHE_PERSISTENT=1
67+
export ENABLE_SDP_FUSION=1
68+
```
69+
> Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
70+
</details>
71+
72+
<details>
73+
74+
<summary>For Intel iGPU</summary>
75+
76+
```bash
77+
export SYCL_CACHE_PERSISTENT=1
78+
export BIGDL_LLM_XMX_DISABLED=1
79+
```
80+
81+
</details>
82+
83+
#### 3.2 Configurations for Windows
84+
<details>
85+
86+
<summary>For Intel iGPU</summary>
87+
88+
```cmd
89+
set SYCL_CACHE_PERSISTENT=1
90+
set BIGDL_LLM_XMX_DISABLED=1
91+
```
92+
93+
</details>
94+
95+
<details>
96+
97+
<summary>For Intel Arc™ A-Series Graphics</summary>
98+
99+
```cmd
100+
set SYCL_CACHE_PERSISTENT=1
101+
```
102+
103+
</details>
104+
105+
> [!NOTE]
106+
> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
107+
### 4. Running examples
108+
109+
```
110+
python ./generate.py
111+
```
112+
113+
Arguments info:
114+
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the Llama3.2-Vision model (e.g. `meta-llama/Llama-3.2-11B-Vision-Instruct`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'meta-llama/Llama-3.2-11B-Vision-Instruct'`.
115+
- `--image-url-or-path IMAGE_URL_OR_PATH`: argument defining the image to be infered. It is default to be `'https://hf-mirror.com/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg'`.
116+
- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'Describe image in detail'`.
117+
- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
118+
119+
#### Sample Output
120+
#### [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)
121+
122+
```log
123+
Inference time: xxxx s
124+
-------------------- Prompt --------------------
125+
Describe image in detail
126+
-------------------- Output --------------------
127+
This image features a charming anthropomorphic rabbit standing on a dirt path, surrounded by a picturesque rural landscape.
128+
129+
The rabbit, with its light brown fur and distinctive large
130+
```
131+
132+
The sample input image is:
133+
134+
<a href="https://hf-mirror.com/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"><img width=400px src="https://hf-mirror.com/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg" ></a>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
#
2+
# Copyright 2016 The BigDL Authors.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
#
16+
17+
import argparse
18+
import os
19+
20+
import requests
21+
import time
22+
import torch
23+
from PIL import Image
24+
from transformers import MllamaForConditionalGeneration, AutoProcessor
25+
26+
from ipex_llm import optimize_model
27+
28+
if __name__ == '__main__':
29+
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for Llama3.2-Vision model')
30+
parser.add_argument('--repo-id-or-model-path', type=str, default="meta-llama/Llama-3.2-11B-Vision-Instruct",
31+
help='The huggingface repo id for the Llama3.2-Vision model to be downloaded'
32+
', or the path to the huggingface checkpoint folder')
33+
parser.add_argument('--image-url-or-path', type=str,
34+
default='https://hf-mirror.com/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg',
35+
help='The URL or path to the image to infer')
36+
parser.add_argument('--prompt', type=str, default="Describe image in detail",
37+
help='Prompt to infer')
38+
parser.add_argument('--n-predict', type=int, default=32,
39+
help='Max tokens to predict')
40+
41+
args = parser.parse_args()
42+
model_path = args.repo_id_or_model_path
43+
image_path = args.image_url_or_path
44+
prompt = args.prompt
45+
46+
model = MllamaForConditionalGeneration.from_pretrained(model_path)
47+
model = optimize_model(model, modules_to_not_convert=["multi_modal_projector"])
48+
model = model.half().eval()
49+
model = model.to('xpu')
50+
51+
processor = AutoProcessor.from_pretrained(model_path)
52+
53+
messages = [
54+
{
55+
"role": "user",
56+
"content": [
57+
{"type": "image"},
58+
{"type": "text", "text": prompt}
59+
]
60+
}
61+
]
62+
text = processor.apply_chat_template(messages, add_generation_prompt=True)
63+
64+
if os.path.exists(image_path):
65+
image = Image.open(image_path)
66+
else:
67+
image = Image.open(requests.get(image_path, stream=True).raw)
68+
69+
inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)
70+
71+
with torch.inference_mode():
72+
for i in range(3):
73+
st = time.time()
74+
output = model.generate(**inputs, do_sample=False, max_new_tokens=args.n_predict)
75+
et = time.time()
76+
print(et - st)
77+
print(processor.decode(output[0]))

0 commit comments

Comments
 (0)