You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[RyzenAI NPU for PyTorch](#install-ryzenai-npu-for-pytorch)
5
+
1.[Install](#install)
6
+
1.[CLI Commands](#cli-commands)
7
+
-[Syntax](#syntax)
8
+
-[Chatting](#chatting)
9
+
-[Accuracy](#accuracy)
10
+
-[Benchmarking](#benchmarking)
11
+
-[Memory Usage](#memory-usage)
12
+
-[Serving](#serving)
13
+
1.[API Overview](#api)
10
14
1.[Code Organization](#code-organization)
11
15
1.[Contributing](#contributing)
12
16
13
-
# Getting Started
14
17
15
-
`lemonade` introduces a brand new set of LLM-focused tools.
18
+
# Install
16
19
17
-
## Install
20
+
You can quickly get started with `lemonade` by installing the `turnkeyml`[PyPI package](#from-pypi) with the appropriate extras for your backend, or you can [install from source](#from-source-code) by cloning and installing this repository.
21
+
22
+
## From PyPI
23
+
24
+
To install `lemonade` from PyPI:
25
+
26
+
1. Create and activate a [miniconda](https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe) environment.
27
+
```bash
28
+
conda create -n lemon python=3.10
29
+
cond activate lemon
30
+
```
31
+
32
+
3. Install lemonade for you backend of choice:
33
+
- [OnnxRuntime GenAI with CPU backend](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/ort_genai_igpu.md):
34
+
```bash
35
+
pip install -e turnkeyml[llm-oga-cpu]
36
+
```
37
+
- [OnnxRuntime GenAI with Integrated GPU (iGPU, DirectML) backend](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/ort_genai_igpu.md):
38
+
> Note: Requires Windows and a DirectML-compatible iGPU.
39
+
```bash
40
+
pip install -e turnkeyml[llm-oga-igpu]
41
+
```
42
+
- OnnxRuntime GenAI with Ryzen AI Hybrid (NPU + iGPU) backend:
43
+
> Note: Ryzen AI Hybrid requires a Windows 11 PC with a AMD Ryzen™ AI 9 HX375, Ryzen AI 9 HX370, or Ryzen AI 9 365 processor.
44
+
> - Install the [Ryzen AI driver >= 32.0.203.237](https://ryzenai.docs.amd.com/en/latest/inst.html#install-npu-drivers) (you can check your driver version under Device Manager > Neural Processors).
45
+
> - Visit the [AMD Hugging Face page](https://huggingface.co/collections/amd/quark-awq-g128-int4-asym-fp16-onnx-hybrid-13-674b307d2ffa21dd68fa41d5) for supported checkpoints.
46
+
```bash
47
+
pip install -e turnkeyml[llm-oga-hybrid]
48
+
lemonade-install --ryzenai hybrid
49
+
```
50
+
- Hugging Face (PyTorch) LLMs for CPU backend:
51
+
```bash
52
+
pip install -e turnkeyml[llm]
53
+
```
54
+
- llama.cpp: see [instructions](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/llamacpp.md).
55
+
56
+
4. Use `lemonade -h` to explore the LLM tools, and see the [command](#cli-commands) and [API](#api) examples below.
1.`cd turnkeyml` (where `turnkeyml` is the repo root of your TurnkeyML clone)
64
+
1. `cd turnkeyml` (where `turnkeyml` is the repo root of your clone)
21
65
- Note: be sure to run these installation instructions from the repo root.
22
-
1. Create and activate a conda environment:
23
-
1.`conda create -n lemon python=3.10`
24
-
1.`conda activate lemon`
25
-
1. Install lemonade: `pip install -e .[llm]`
26
-
- or `pip install -e .[llm-oga-igpu]` if you want to use `onnxruntime-genai` (see [OGA](#install-onnxruntime-genai))
27
-
1.`lemonade -h` to explore the LLM tools
66
+
1. Follow the same instructions as in the [PyPI installation](#from-pypi), except replace the `turnkeyml` with a `.`.
67
+
- For example: `pip install -e .[llm-oga-igpu]`
68
+
69
+
# CLI Commands
70
+
71
+
The `lemonade` CLI uses a unique command syntax that enables convenient interoperability between models, frameworks, devices, accuracy tests, and deployment options.
72
+
73
+
Each unit of functionality (e.g., loading a model, running a test, deploying a server, etc.) is called a `Tool`, and a single call to `lemonade` can invoke any number of `Tools`. Each `Tool` will perform its functionality, then pass its state to the next `Tool`in the command.
74
+
75
+
You can read each command out loud to understand what it is doing. For example, a command like this:
> Run `lemonade` on the input `(-i)` checkpoint `microsoft/Phi-3-mini-4k-instruct`. First, load it in the OnnxRuntime GenAI framework (`oga-load`), on to the integrated GPU device (`--device igpu`) in the int4 data type (`--dtype int4`). Then, pass the OGA model to the prompting tool (`llm-prompt`) with the prompt (`-p`) "Hello, my thoughts are" and print the response.
84
+
85
+
The `lemonade -h`command will show you which options and Tools are available, and `lemonade TOOL -h` will tell you more about that specific Tool.
30
86
31
-
The `lemonade` CLI uses the same style of syntax as `turnkey`, but with a new set of LLM-specific tools. You can read about that syntax [here](https://github.com/onnx/turnkeyml#how-it-works).
32
87
33
88
## Chatting
34
89
35
90
To chat with your LLM try:
36
91
37
-
`lemonade -i facebook/opt-125m huggingface-load llm-prompt -p "Hello, my thoughts are"`
lemonade -i facebook/opt-125m huggingface-load llm-prompt -p "Hello, my thoughts are"
100
+
```
38
101
39
-
The LLM will run on CPU with your provided prompt, and the LLM's response to your prompt will be printed to the screen. You can replace the `"Hello, my thoughts are"` with any prompt you like.
102
+
The LLM will run with your provided prompt, and the LLM's response to your prompt will be printed to the screen. You can replace the `"Hello, my thoughts are"` with any prompt you like.
40
103
41
-
You can also replace the `facebook/opt-125m` with any Huggingface checkpoint you like, including LLaMA-2, Phi-2, Qwen, Mamba, etc.
104
+
You can also replace the `facebook/opt-125m` with any Hugging Face checkpoint you like, including LLaMA-2, Phi-2, Qwen, Mamba, etc.
42
105
43
-
You can also set the `--device` argument in `huggingface-load` to load your LLM on a different device.
106
+
You can also set the `--device` argument in `oga-load` and `huggingface-load` to load your LLM on a different device.
44
107
45
108
Run `lemonade huggingface-load -h` and `lemonade llm-prompt -h` to learn more about those tools.
46
109
47
110
## Accuracy
48
111
49
112
To measure the accuracy of an LLM using MMLU, try this:
That command will run a few warmup iterations, then a few generation iterations where performance data is collected.
64
143
65
-
The prompt size, number of output tokens, and number iterations are all parameters. Learn more by running `lemonade huggingface-bench -h`.
144
+
The prompt size, number of output tokens, and number iterations are all parameters. Learn more by running `lemonade oga-bench -h` or `lemonade huggingface-bench -h`.
66
145
67
146
## Memory Usage
68
147
69
-
The peak memory used by the lemonade build is captured in the build output. To capture more granular
148
+
The peak memory used by the `lemonade` build is captured in the build output. To capture more granular
70
149
memory usage information, use the `--memory` flag. For example:
Once the server has launched, you can connect to it from your own application, or interact directly by following the on-screen instructions to open a basic web app.
84
-
85
-
Note that the `llm-prompt`, `accuracy-mmlu`, and `serve` tools can all be used with other model-loading tools, for example `onnxruntime-genai` or `ryzenai-transformers`. See [Install Specialized Tools](#install-specialized-tools) for details.
86
-
87
-
## API
88
-
89
-
Lemonade is also available via API. Here's a quick example of how to benchmark an LLM:
90
-
91
-
```python
92
-
import lemonade.tools.torch_llm as tl
93
-
import lemonade.tools.chat as cl
94
-
from turnkeyml.state import State
95
-
96
-
state = State(cache_dir="cache", build_name="test")
97
-
98
-
state = tl.HuggingfaceLoad().run(state, input="facebook/opt-125m")
99
-
state = cl.Prompt().run(state, prompt="hi", max_new_tokens=15)
Once the server has launched, you can connect to it from your own application, or interact directly by following the on-screen instructions to open a basic web app.
105
179
106
-
Lemonade supports specialized tools that each require their own setup steps. **Note:** These tools will only appear in `lemonade -h` if you run in an environment that has completed setup.
180
+
# API
107
181
108
-
## Install OnnxRuntime-GenAI
182
+
Lemonade is also available via API.
109
183
110
-
To install support for [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai), use `pip install -e .[llm-oga-igpu]` instead of the default installation command.
184
+
## LEAP APIs
111
185
112
-
You can then load supported OGA models on to CPU or iGPU with the `oga-load` tool, for example:
186
+
The lemonade enablement platform (LEAP) API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) and backend (e.g., CPU, iGPU, Hybrid). This makes it easy to integrate lemonade LLMs into Python applications.
The `oga-bench` tool is available to capture tokens/second and time-to-first-token metrics: `lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 oga-bench`. Learn more with `lemonade oga-bench -h`.
194
+
input_ids = tokenizer("This is my prompt", return_tensors="pt").input_ids
You can learn more about the LEAP APIs [here](https://github.com/onnx/turnkeyml/tree/main/examples/lemonade).
123
201
124
-
You can learn more about the CPU and iGPU support in our [OGA documentation](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/ort_genai_igpu.md).
202
+
## Low-Level API
125
203
126
-
> Note: early access to AMD's RyzenAI NPU is also available. See the [RyzenAI NPU OGA documentation](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/ort_genai_npu.md) for more information.
204
+
The low-level API is useful for designing custom experiments. For example, sweeping over specific checkpoints, devices, and/or tools.
127
205
128
-
## Install RyzenAI NPU for PyTorch
206
+
Here's a quick example of how to prompt a Hugging Face LLM using the low-level API, which calls the load and prompt tools one by one:
129
207
130
-
To run your LLMs on RyzenAI NPU, first install and set up the `ryzenai-transformers` conda environment (see instructions [here](https://github.com/amd/RyzenAI-SW/blob/main/example/transformers/models/llm/docs/README.md)). Then, install `lemonade` into `ryzenai-transformers`. The `ryzenai-npu-load` Tool will become available in that environment.
208
+
```python
209
+
import lemonade.tools.torch_llm as tl
210
+
import lemonade.tools.chat as cl
211
+
from turnkeyml.state import State
131
212
132
-
You can try it out with: `lemonade -i meta-llama/Llama-2-7b-chat-hf ryzenai-npu-load --device DEVICE llm-prompt -p "Hello, my thoughts are"`
213
+
state = State(cache_dir="cache", build_name="test")
133
214
134
-
Where `DEVICE` is either "phx" or "stx" if you have a RyzenAI 7xxx/8xxx or 3xx/9xxx processor, respectively.
215
+
state = tl.HuggingfaceLoad().run(state, input="facebook/opt-125m")
216
+
state = cl.Prompt().run(state, prompt="hi", max_new_tokens=15)
135
217
136
-
> Note: only `meta-llama/Llama-2-7b-chat-hf` and `microsoft/Phi-3-mini-4k-instruct` are supported by `lemonade` at this time. Contributions appreciated!
218
+
print("Response:", state.response)
219
+
```
137
220
138
221
# Contributing
139
222
140
-
If you decide to contribute, please:
223
+
Contributions are welcome!If you decide to contribute, please:
141
224
142
-
-do so via a pull request.
143
-
-write your code in keeping with the same style as the rest of this repo's code.
144
-
-add a test under `test/lemonade/llm_api.py` that provides coverage of your new feature.
225
+
- Do so via a pull request.
226
+
- Write your code in keeping with the same style as the rest of this repo's code.
227
+
- Add a test under `test/lemonade` that provides coverage of your new feature.
145
228
146
229
The best way to contribute is to add new tools to cover more devices and usage scenarios.
147
230
@@ -150,3 +233,5 @@ To add a new tool:
150
233
1. (Optional) Create a new `.py` file under `src/lemonade/tools` (or use an existing file if your tool fits into a pre-existing family of tools).
151
234
1. Define a new class that inherits the `Tool` class from `TurnkeyML`.
152
235
1. Register the class by adding it to the list of `tools` near the top of `src/lemonade/cli.py`.
236
+
237
+
You can learn more about contributing on the repository's [contribution guide](https://github.com/onnx/turnkeyml/blob/main/docs/contribute.md).
0 commit comments