@@ -24,25 +24,24 @@ source .venv/bin/activate
24
24
25
25
## Install stable shark-ai packages
26
26
27
- <!-- TODO: Add `sharktank` to `shark-ai` meta package -->
27
+ First install a torch version that fulfills your needs:
28
28
29
29
``` bash
30
- pip install shark-ai[apps] sharktank
30
+ # Fast installation of torch with just CPU support.
31
+ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
31
32
```
32
33
33
- ### Nightly packages
34
+ For other options, see https://pytorch.org/get-started/locally/ .
34
35
35
- To install nightly packages:
36
-
37
- <!-- TODO: Add `sharktank` to `shark-ai` meta package -->
36
+ Next install shark-ai:
38
37
39
38
``` bash
40
- pip install shark-ai[apps] sharktank \
41
- --pre --find-links https://github.com/nod-ai/shark-ai/releases/expanded_assets/dev-wheels
39
+ pip install shark-ai[apps]
42
40
```
43
41
44
- See also the
45
- [ instructions here] ( https://github.com/nod-ai/shark-ai/blob/main/docs/nightly_releases.md ) .
42
+ > [ !TIP]
43
+ > To switch from the stable release channel to the nightly release channel,
44
+ > see [ ` nightly_releases.md ` ] ( ../../../nightly_releases.md ) .
46
45
47
46
### Define a directory for export files
48
47
@@ -192,25 +191,41 @@ cat shortfin_llm_server.log
192
191
[2024-10-24 15:40:27.444] [info] [server.py:214] Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
193
192
```
194
193
195
- ## Verify server
194
+ ## Test the server
196
195
197
- We can now verify our LLM server by sending a simple request:
196
+ We can now test our LLM server.
198
197
199
- ### Open python shell
198
+ First let's confirm that it is running:
200
199
201
200
``` bash
202
- python
201
+ curl -i http://localhost:8000/health
202
+
203
+ # HTTP/1.1 200 OK
204
+ # date: Thu, 19 Dec 2024 19:40:43 GMT
205
+ # server: uvicorn
206
+ # content-length: 0
203
207
```
204
208
205
- ### Send request
209
+ Next, let's send a generation request:
206
210
207
- ``` python
208
- import requests
211
+ ``` bash
212
+ curl http://localhost:8000/generate \
213
+ -H " Content-Type: application/json" \
214
+ -d ' {
215
+ "text": "Name the capital of the United States.",
216
+ "sampling_params": {"max_completion_tokens": 50}
217
+ }'
218
+ ```
219
+
220
+ ### Send requests from Python
209
221
222
+ You can also send HTTP requests from Python like so:
223
+
224
+ ``` python
210
225
import os
226
+ import requests
211
227
212
228
port = 8000 # Change if running on a different port
213
-
214
229
generate_url = f " http://localhost: { port} /generate "
215
230
216
231
def generation_request ():
@@ -225,16 +240,16 @@ def generation_request():
225
240
generation_request()
226
241
```
227
242
228
- After you receive the request, you can exit the python shell:
243
+ ## Cleanup
244
+
245
+ When done, you can stop the shortfin_llm_server by killing the process:
229
246
230
247
``` bash
231
- quit ()
248
+ kill -9 $shortfin_process
232
249
```
233
250
234
- ## Cleanup
235
-
236
- When done, you can kill the shortfin_llm_server by killing the process:
251
+ If you want to find the process again:
237
252
238
253
``` bash
239
- kill -9 $shortfin_process
254
+ ps -f | grep shortfin
240
255
```
0 commit comments