Skip to content

Commit cf7f1c6

Browse files
Rev to 6.0.1 (#292)
1 parent 478bf5b commit cf7f1c6

File tree

12 files changed

+554
-167
lines changed

12 files changed

+554
-167
lines changed

.github/workflows/test_lemonade.yml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,3 @@ jobs:
5959
# Test high-level APIs
6060
python examples/lemonade/api_basic.py
6161
python examples/lemonade/api_streaming.py
62-
63-
# Test server
64-
python test/lemonade/server.py
65-
66-

.github/workflows/test_server.yml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# This workflow will install Python dependencies, run tests and lint with a single version of Python
2+
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python
3+
4+
name: Test Lemonade Server
5+
6+
on:
7+
push:
8+
branches: ["main"]
9+
pull_request:
10+
branches: ["main"]
11+
12+
permissions:
13+
contents: read
14+
15+
jobs:
16+
make-server-lemonade:
17+
env:
18+
LEMONADE_CI_MODE: "True"
19+
strategy:
20+
matrix:
21+
os: [ubuntu-latest, windows-latest]
22+
runs-on: ${{ matrix.os }}
23+
steps:
24+
- uses: actions/checkout@v3
25+
- name: Set up Miniconda with 64-bit Python
26+
uses: conda-incubator/setup-miniconda@v2
27+
with:
28+
miniconda-version: "latest"
29+
activate-environment: lemon
30+
python-version: "3.10"
31+
run-post: "false"
32+
- name: Install dependencies
33+
shell: bash -el {0}
34+
run: |
35+
python -m pip install --upgrade pip
36+
python -m pip check
37+
pip install -e .[llm]
38+
- name: Run server tests
39+
shell: bash -el {0}
40+
run: |
41+
python test/lemonade/server.py
42+
43+
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# This is a no-op workflow that has inverse path filtering to test_turnkey.yml
2+
3+
name: Lint and Test TurnkeyML
4+
5+
on:
6+
pull_request:
7+
branches: ["main", "canary", "refresh"]
8+
ignore-paths:
9+
- src/turnkeyml/**
10+
- test/turnkey/**
11+
- examples/turnkey/**
12+
- .github/workflows/test_turnkey.yml
13+
14+
permissions:
15+
contents: read
16+
17+
jobs:
18+
build-turnkey:
19+
strategy:
20+
matrix:
21+
python-version: ["3.8", "3.11"]
22+
os: [ubuntu-latest, windows-latest]
23+
runs-on: ubuntu-latest
24+
steps:
25+
- run: 'echo "No test_turnkey build required because no files that match the paths filters were changed."'

docs/lemonade/getting_started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ To install `lemonade` from PyPI:
1818
1. Create and activate a [miniconda](https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe) environment.
1919
```bash
2020
conda create -n lemon python=3.10
21-
cond activate lemon
21+
conda activate lemon
2222
```
2323

2424
3. Install lemonade for you backend of choice:

docs/lemonade/server_spec.md

Lines changed: 70 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ We are also actively investigating and developing [additional endpoints](#additi
88

99
### OpenAI-Compatible Endpoints
1010
- POST `/api/v0/chat/completions` - Chat Completions (messages -> completion)
11+
- POST `/api/v0/completions` - Text Completions (prompt -> completion)
1112
- GET `/api/v0/models` - List available models
1213

1314
### Additional Endpoints
@@ -22,7 +23,6 @@ They focus on enabling client applications by extending existing cloud-focused A
2223
- Unload models to save memory space.
2324

2425
The additional endpoints under development are:
25-
- POST `/api/v0/completions` - Text Completions (prompt -> completion)
2626
- POST `/api/v0/load` - Load a model
2727
- POST `/api/v0/unload` - Unload a model
2828
- POST `/api/v0/params` - Set generation parameters
@@ -46,19 +46,20 @@ lemonade serve
4646

4747
### `POST /api/v0/chat/completions` <sub>![Status](https://img.shields.io/badge/status-partially_available-green)</sub>
4848

49-
Chat Completions API. You provide a list of messages and receive a streamed completion. This API will also load the model if it is not already loaded.
49+
Chat Completions API. You provide a list of messages and receive a completion. This API will also load the model if it is not already loaded.
5050

51-
### Parameters
51+
#### Parameters
5252

5353
| Parameter | Required | Description | Status |
5454
|-----------|----------|-------------|--------|
5555
| `messages` | Yes | Array of messages in the conversation. Each message should have a `role` ("user" or "assistant") and `content` (the message text). | <sub>![Status](https://img.shields.io/badge/available-green)</sub> |
5656
| `model` | Yes | The model to use for the completion. | <sub>![Status](https://img.shields.io/badge/available-green)</sub> |
5757
| `stream` | No | If true, tokens will be sent as they are generated. If false, the response will be sent as a single message once complete. Defaults to false. | <sub>![Status](https://img.shields.io/badge/available-green)</sub> |
58+
| `stop` | No | Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. Can be a string or an array of strings. | <sub>![Status](https://img.shields.io/badge/available-green)</sub> |
5859
| `logprobs` | No | Include log probabilities of the output tokens. If true, returns the log probability of each output token. Defaults to false. | <sub>![Status](https://img.shields.io/badge/WIP-yellow)</sub> |
5960

6061

61-
### Example request
62+
#### Example request
6263

6364
```bash
6465
curl -X POST http://localhost:8000/api/v0/chat/completions ^
@@ -68,13 +69,12 @@ curl -X POST http://localhost:8000/api/v0/chat/completions ^
6869
\"messages\": [
6970
{\"role\": \"user\", \"content\": \"What is the population of Paris?\"}
7071
],
71-
\"stream\": true
72+
\"stream\": false
7273
}"
73-
7474
```
7575
*Hint: To try, "Paste as One Line" in Windows `cmd`.*
7676

77-
### Response format
77+
#### Response format
7878

7979
For non-streaming responses:
8080
```json
@@ -89,10 +89,6 @@ For non-streaming responses:
8989
"role": "assistant",
9090
"content": "Paris has a population of approximately 2.2 million people in the city proper."
9191
},
92-
"logprobs": {
93-
"tokens": ["Paris", " has", " a", " population", ...],
94-
"token_logprobs": [-0.12, -0.05, -0.02, -0.15, ...]
95-
},
9692
"finish_reason": "stop"
9793
}]
9894
}
@@ -115,21 +111,66 @@ For streaming responses, the API returns a stream of server-sent events:
115111
}
116112
```
117113

114+
115+
### `POST /api/v0/completions` <sub>![Status](https://img.shields.io/badge/status-partially_available-green)</sub>
116+
117+
Text Completions API. You provide a prompt and receive a completion. This API will also load the model if it is not already loaded.
118+
119+
#### Parameters
120+
121+
| Parameter | Required | Description | Status |
122+
|-----------|----------|-------------|--------|
123+
| `prompt` | Yes | The prompt to use for the completion. | <sub>![Status](https://img.shields.io/badge/available-green)</sub> |
124+
| `model` | Yes | The model to use for the completion. | <sub>![Status](https://img.shields.io/badge/available-green)</sub> |
125+
| `stream` | No | If true, tokens will be sent as they are generated. If false, the response will be sent as a single message once complete. Defaults to false. | <sub>![Status](https://img.shields.io/badge/available-green)</sub> |
126+
| `stop` | No | Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. Can be a string or an array of strings. | <sub>![Status](https://img.shields.io/badge/available-green)</sub> |
127+
| `logprobs` | No | Include log probabilities of the output tokens. If true, returns the log probability of each output token. Defaults to false. | <sub>![Status](https://img.shields.io/badge/WIP-yellow)</sub> |
128+
129+
130+
#### Example request
131+
132+
```bash
133+
curl -X POST http://localhost:8000/api/v0/completions ^
134+
-H "Content-Type: application/json" ^
135+
-d "{
136+
\"model\": \"Llama-3.2-1B-Instruct-Hybrid\",
137+
\"prompt\": \"What is the population of Paris?\",
138+
\"stream\": false
139+
}"
140+
```
141+
142+
#### Response format
143+
144+
The following format is used for both streaming and non-streaming responses:
145+
```json
146+
{
147+
"id": "0",
148+
"object": "text_completion",
149+
"created": <UNIX_TIMESTAMP>,
150+
"model": "Llama-3.2-1B-Instruct-Hybrid",
151+
"choices": [{
152+
"index": 0,
153+
"text": "Paris has a population of approximately 2.2 million people in the city proper.",
154+
"finish_reason": "stop"
155+
}],
156+
}
157+
```
158+
118159
### `GET /api/v0/models` <sub>![Status](https://img.shields.io/badge/status-fully_available-green)</sub>
119160

120161
Returns a list of key models available on the server in an OpenAI-compatible format. This list is curated based on what works best for Ryzen AI Hybrid. Additional models can be loaded via the `/api/v0/load` endpoint by specifying the Hugging Face checkpoint.
121162

122-
### Parameters
163+
#### Parameters
123164

124165
This endpoint does not take any parameters.
125166

126-
### Example request
167+
#### Example request
127168

128169
```bash
129170
curl http://localhost:8000/api/v0/models
130171
```
131172

132-
### Response format
173+
#### Response format
133174

134175
```json
135176
{
@@ -153,51 +194,19 @@ curl http://localhost:8000/api/v0/models
153194

154195
## Additional Endpoints
155196

156-
### `POST /api/v0/completions` <sub>![Status](https://img.shields.io/badge/status-partially_available-green)</sub>
157-
158-
Text Completions API. You provide a prompt and receive a streamed completion. This API will also load the model if it is not already loaded.
159-
160-
### Parameters
161-
162-
| Parameter | Required | Description | Status |
163-
|-----------|----------|-------------|--------|
164-
| `prompt` | Yes | The prompt to use for the completion. | <sub>![Status](https://img.shields.io/badge/available-green)</sub> |
165-
| `model` | Yes | The model to use for the completion. | <sub>![Status](https://img.shields.io/badge/available-green)</sub> |
166-
| All other params of `/api/v0/load` | No | Detailed loading options as defined in the `/api/v0/load` endpoint. | <sub>![Status](https://img.shields.io/badge/WIP-yellow)</sub> |
167-
| All other params of `/api/v0/params` | No | Detailed generation options as defined in the `/api/v0/params` endpoint. | <sub>![Status](https://img.shields.io/badge/WIP-yellow)</sub> |
168-
169-
### Example request
170-
171-
```bash
172-
curl -X POST http://localhost:8000/api/v0/completions ^
173-
-H "Content-Type: application/json" ^
174-
-d "{
175-
\"model\": \"<CHECKPOINT>\",
176-
\"prompt\": \"the meaning of life is\"
177-
}"
178-
```
179-
180-
### Response format
181-
182-
```json
183-
{
184-
"text": " to find your purpose, and once you have",
185-
}
186-
```
187-
188197
### `GET /api/v0/load` <sub>![Status](https://img.shields.io/badge/status-fully_available-green)</sub>
189198

190199
Explicitly load a model. This is useful to ensure that the model is loaded before you make a request.
191200

192-
### Parameters
201+
#### Parameters
193202

194203
| Parameter | Required | Description |
195204
|-----------|----------|-------------|
196205
| `model` | Yes | HuggingFace checkpoint to load. |
197206
| `device` | No | Device to load the model on. Defaults to `hybrid`. |
198207
| `cache_dir` | No | Parent directory where models are stored. Defaults to `~/.cache/lemonade`. |
199208

200-
### Example request
209+
#### Example request
201210

202211
```bash
203212
curl http://localhost:8000/api/v0/load \
@@ -208,7 +217,7 @@ curl http://localhost:8000/api/v0/load \
208217
}'
209218
```
210219

211-
### Response format
220+
#### Response format
212221

213222
```json
214223
{
@@ -223,17 +232,17 @@ In case of an error, the status will be `error` and the message will contain the
223232

224233
Explicitly unload a model. This is useful to free up memory and disk space while still leaving the server runnning (which takes minimal resources but a few seconds to start).
225234

226-
### Parameters
235+
#### Parameters
227236

228237
This endpoint does not take any parameters.
229238

230-
### Example request
239+
#### Example request
231240

232241
```bash
233242
curl http://localhost:8000/api/v0/unload
234243
```
235244

236-
### Response format
245+
#### Response format
237246

238247
```json
239248
{
@@ -246,7 +255,7 @@ In case of an error, the status will be `error` and the message will contain the
246255
### `POST /api/v0/params` <sub>![Status](https://img.shields.io/badge/status-in_development-yellow)</sub>
247256
Set the generation parameters for text completion. These parameters will persist across requests until changed.
248257

249-
### Parameters
258+
#### Parameters
250259

251260
| Parameter | Required | Description |
252261
|-----------|----------|-------------|
@@ -257,7 +266,7 @@ Set the generation parameters for text completion. These parameters will persist
257266
| `max_length` | No | The maximum length of the generated text in tokens. Defaults to 2048. |
258267
| `do_sample` | No | Whether to use sampling (true) or greedy decoding (false). Defaults to true. |
259268

260-
### Example request
269+
#### Example request
261270

262271
```bash
263272
curl http://localhost:8000/api/v0/params \
@@ -269,7 +278,7 @@ curl http://localhost:8000/api/v0/params \
269278
}'
270279
```
271280

272-
### Response format
281+
#### Response format
273282

274283
```json
275284
{
@@ -291,17 +300,17 @@ In case of an error, the status will be `error` and the message will contain the
291300

292301
Check the health of the server. This endpoint will also return the currently loaded model.
293302

294-
### Parameters
303+
#### Parameters
295304

296305
This endpoint does not take any parameters.
297306

298-
### Example request
307+
#### Example request
299308

300309
```bash
301310
curl http://localhost:8000/api/v0/health
302311
```
303312

304-
### Response format
313+
#### Response format
305314

306315
```json
307316
{
@@ -313,17 +322,17 @@ curl http://localhost:8000/api/v0/health
313322

314323
Performance statistics from the last request.
315324

316-
### Parameters
325+
#### Parameters
317326

318327
This endpoint does not take any parameters.
319328

320-
### Example request
329+
#### Example request
321330

322331
```bash
323332
curl http://localhost:8000/api/v0/stats
324333
```
325334

326-
### Response format
335+
#### Response format
327336

328337
```json
329338
{

0 commit comments

Comments
 (0)