You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/mddocs/Quickstart/npu_quickstart.md
+10-5
Original file line number
Diff line number
Diff line change
@@ -193,7 +193,8 @@ Refer to the following table for verified models:
193
193
| LLaMA 3.2 |[meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)| Meteor Lake, Lunar Lake, Arrow Lake |
194
194
| DeepSeek-R1 |[deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B), [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)| Meteor Lake, Lunar Lake, Arrow Lake |
195
195
196
-
### Setup for running llama.cpp
196
+
### Run GGUF model using CLI tool
197
+
#### Setup for running llama.cpp
197
198
198
199
First, you should create a directory to use `llama.cpp`, for instance, use following command to create a `llama-cpp-npu` directory and enter it.
199
200
@@ -208,23 +209,27 @@ Then, please run the following command with **administrator privilege in Minifor
208
209
init-llama-cpp.bat
209
210
```
210
211
211
-
### Model Download
212
+
####Model Download
212
213
213
214
Before running, you should download or copy community GGUF model to your current directory. For instance, `DeepSeek-R1-Distill-Qwen-7B-Q6_K.gguf` of [DeepSeek-R1-Distill-Qwen-7B-GGUF](https://huggingface.co/lmstudio-community/DeepSeek-R1-Distill-Qwen-7B-GGUF/tree/main).
214
215
215
-
### Run the quantized model
216
+
####Run the quantized model
216
217
217
218
Please refer to [Runtime Configurations](#runtime-configurations) before running the following command in Miniforge Prompt.
218
219
219
220
```cmd
220
221
llama-cli-npu.exe -m DeepSeek-R1-Distill-Qwen-7B-Q6_K.gguf -n 32 --prompt "What is AI?"
221
222
```
222
223
224
+
And you could use `llama-cli-npu.exe -h` for more details about meaning of each parameter.
225
+
226
+
### Run GGUF model using llama.cpp C++ API
227
+
228
+
IPEX-LLM also supports `llama.cpp` C++ API for running GGUF models on Intel NPU. Refer to [Simple Example](../../../python/llm/example/NPU/llama.cpp/) for usage in details.
229
+
223
230
> **Note**:
224
231
>
225
232
> -**Warmup on first run**: When running specific GGUF models on NPU for the first time, you might notice delays up to several minutes before the first token is generated. This delay occurs because the blob compilation.
226
-
> - For more details about meaning of each parameter, you can use `llama-cli-npu.exe -h`.
# (Experimental) Example of running GGUF model using llama.cpp C++ API on NPU
2
+
In this directory, you will find a simple C++ example on how to run GGUF models on Intel NPUs using `llama.cpp` C++ API. See the table blow for verified models.
Please refer to [Quickstart](../../../../../docs/mddocs/Quickstart/npu_quickstart.md#experimental-llamacpp-support) for details about verified platforms.
12
+
13
+
## 0. Prerequisites
14
+
For `ipex-llm` NPU support, please refer to [Quickstart](../../../../../docs/mddocs/Quickstart/npu_quickstart.md#install-prerequisites) for details about the required preparations.
15
+
16
+
## 1. Install & Runtime Configurations
17
+
### 1.1 Installation on Windows
18
+
We suggest using conda to manage environment:
19
+
```cmd
20
+
conda create -n llm python=3.11
21
+
conda activate llm
22
+
23
+
:: for building the example
24
+
pip install cmake
25
+
26
+
:: install ipex-llm with 'npu' option
27
+
pip install --pre --upgrade ipex-llm[npu]
28
+
```
29
+
30
+
Please refer to [Quickstart](../../../../../docs/mddocs/Quickstart/npu_quickstart.md#install-prerequisites) for more details about `ipex-llm` installation on Intel NPU.
31
+
32
+
### 1.2 Runtime Configurations
33
+
Please refer to [Quickstart](../../../../../docs/mddocs/Quickstart/npu_quickstart.md#runtime-configurations) for environment variables setting based on your device.
34
+
35
+
## 2. Build C++ Example `simple`
36
+
37
+
- You can run below cmake script in cmd to build `simple` by yourself, don't forget to replace below <CONDA_ENV_DIR> with your own path.
38
+
39
+
```cmd
40
+
:: under current directory
41
+
:: please replace below conda env dir with your own path
42
+
set CONDA_ENV_DIR=C:\Users\arda\miniforge3\envs\llm\Lib\site-packages
43
+
mkdir build
44
+
cd build
45
+
cmake ..
46
+
cmake --build . --config Release -j
47
+
cd Release
48
+
```
49
+
50
+
- You can also directly use our released `simple.exe` which has the same usage as this example `simple.cpp`
51
+
52
+
## 3. Run `simple`
53
+
54
+
With built `simple`, you can run the GGUF model
55
+
56
+
```cmd
57
+
# Run simple text completion
58
+
simple.exe -m <gguf_model_path> -n 64 -p "Once upon a time,"
59
+
```
60
+
61
+
> **Note**:
62
+
>
63
+
> **Warmup on first run**: When running specific GGUF models on NPU for the first time, you might notice delays up to several minutes before the first token is generated. This delay occurs because the blob compilation.
0 commit comments