You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Chat Completions API. You provide a list of messages and receive a streamed completion. This API will also load the model if it is not already loaded.
49
+
Chat Completions API. You provide a list of messages and receive a completion. This API will also load the model if it is not already loaded.
50
50
51
-
### Parameters
51
+
####Parameters
52
52
53
53
| Parameter | Required | Description | Status |
54
54
|-----------|----------|-------------|--------|
55
55
|`messages`| Yes | Array of messages in the conversation. Each message should have a `role` ("user" or "assistant") and `content` (the message text). | <sub></sub> |
56
56
|`model`| Yes | The model to use for the completion. | <sub></sub> |
57
57
|`stream`| No | If true, tokens will be sent as they are generated. If false, the response will be sent as a single message once complete. Defaults to false. | <sub></sub> |
58
+
|`stop`| No | Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. Can be a string or an array of strings. | <sub></sub> |
58
59
|`logprobs`| No | Include log probabilities of the output tokens. If true, returns the log probability of each output token. Defaults to false. | <sub></sub> |
59
60
60
61
61
-
### Example request
62
+
####Example request
62
63
63
64
```bash
64
65
curl -X POST http://localhost:8000/api/v0/chat/completions ^
@@ -68,13 +69,12 @@ curl -X POST http://localhost:8000/api/v0/chat/completions ^
68
69
\"messages\": [
69
70
{\"role\": \"user\", \"content\": \"What is the population of Paris?\"}
70
71
],
71
-
\"stream\": true
72
+
\"stream\": false
72
73
}"
73
-
74
74
```
75
75
*Hint: To try, "Paste as One Line" in Windows `cmd`.*
76
76
77
-
### Response format
77
+
####Response format
78
78
79
79
For non-streaming responses:
80
80
```json
@@ -89,10 +89,6 @@ For non-streaming responses:
89
89
"role": "assistant",
90
90
"content": "Paris has a population of approximately 2.2 million people in the city proper."
Text Completions API. You provide a prompt and receive a completion. This API will also load the model if it is not already loaded.
118
+
119
+
#### Parameters
120
+
121
+
| Parameter | Required | Description | Status |
122
+
|-----------|----------|-------------|--------|
123
+
|`prompt`| Yes | The prompt to use for the completion. | <sub></sub> |
124
+
|`model`| Yes | The model to use for the completion. | <sub></sub> |
125
+
|`stream`| No | If true, tokens will be sent as they are generated. If false, the response will be sent as a single message once complete. Defaults to false. | <sub></sub> |
126
+
|`stop`| No | Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. Can be a string or an array of strings. | <sub></sub> |
127
+
|`logprobs`| No | Include log probabilities of the output tokens. If true, returns the log probability of each output token. Defaults to false. | <sub></sub> |
128
+
129
+
130
+
#### Example request
131
+
132
+
```bash
133
+
curl -X POST http://localhost:8000/api/v0/completions ^
134
+
-H "Content-Type: application/json" ^
135
+
-d "{
136
+
\"model\": \"Llama-3.2-1B-Instruct-Hybrid\",
137
+
\"prompt\": \"What is the population of Paris?\",
138
+
\"stream\": false
139
+
}"
140
+
```
141
+
142
+
#### Response format
143
+
144
+
The following format is used for both streaming and non-streaming responses:
145
+
```json
146
+
{
147
+
"id": "0",
148
+
"object": "text_completion",
149
+
"created": <UNIX_TIMESTAMP>,
150
+
"model": "Llama-3.2-1B-Instruct-Hybrid",
151
+
"choices": [{
152
+
"index": 0,
153
+
"text": "Paris has a population of approximately 2.2 million people in the city proper.",
Returns a list of key models available on the server in an OpenAI-compatible format. This list is curated based on what works best for Ryzen AI Hybrid. Additional models can be loaded via the `/api/v0/load` endpoint by specifying the Hugging Face checkpoint.
Text Completions API. You provide a prompt and receive a streamed completion. This API will also load the model if it is not already loaded.
159
-
160
-
### Parameters
161
-
162
-
| Parameter | Required | Description | Status |
163
-
|-----------|----------|-------------|--------|
164
-
|`prompt`| Yes | The prompt to use for the completion. | <sub></sub> |
165
-
|`model`| Yes | The model to use for the completion. | <sub></sub> |
166
-
| All other params of `/api/v0/load`| No | Detailed loading options as defined in the `/api/v0/load` endpoint. | <sub></sub> |
167
-
| All other params of `/api/v0/params`| No | Detailed generation options as defined in the `/api/v0/params` endpoint. | <sub></sub> |
168
-
169
-
### Example request
170
-
171
-
```bash
172
-
curl -X POST http://localhost:8000/api/v0/completions ^
173
-
-H "Content-Type: application/json" ^
174
-
-d "{
175
-
\"model\": \"<CHECKPOINT>\",
176
-
\"prompt\": \"the meaning of life is\"
177
-
}"
178
-
```
179
-
180
-
### Response format
181
-
182
-
```json
183
-
{
184
-
"text": " to find your purpose, and once you have",
@@ -223,17 +232,17 @@ In case of an error, the status will be `error` and the message will contain the
223
232
224
233
Explicitly unload a model. This is useful to free up memory and disk space while still leaving the server runnning (which takes minimal resources but a few seconds to start).
225
234
226
-
### Parameters
235
+
####Parameters
227
236
228
237
This endpoint does not take any parameters.
229
238
230
-
### Example request
239
+
####Example request
231
240
232
241
```bash
233
242
curl http://localhost:8000/api/v0/unload
234
243
```
235
244
236
-
### Response format
245
+
####Response format
237
246
238
247
```json
239
248
{
@@ -246,7 +255,7 @@ In case of an error, the status will be `error` and the message will contain the
0 commit comments