Skip to content

Commit 71dcb8f

Browse files
author
sd109
committed
First draft of README
1 parent 4f69c25 commit 71dcb8f

File tree

1 file changed

+48
-1
lines changed

1 file changed

+48
-1
lines changed

README.md

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,50 @@
11
# Azimuth LLM
22

3-
A Helm chart for deploying LLMs on Azimuth.
3+
This repository contains a Helm chart for deploying Large Language Models (LLMs) on Kubernetes. It is developed primarily for use as a pre-packaged application within [Azimuth](https://www.stackhpc.com/azimuth-introduction.html) but is structured such that it can, in principle, be deployed on any Kubernetes cluster with at least 1 GPU node.
4+
5+
# Deployment
6+
7+
## Azimuth
8+
9+
This app ~~is~~ will soon be provided as part of a standard deployment Azimuth so no specific steps are required to use this app other than access to an up to date Azimuth deployment.
10+
11+
## Manual Installation
12+
13+
To set up the Helm repository and manually install this chart on an arbitrary Kubernetes cluster, run
14+
15+
```
16+
helm repo add <chosen-repo-name> https://stackhpc.github.io/azimuth-llm/
17+
helm repo update
18+
helm install <installation-name> <chosen-repo-name>/azimuth-llm --version <version>
19+
```
20+
21+
where version is the full published version for the specified commit (e.g. `0.1.0-dev.0.main.125`). To see the latest published version, see [here](https://github.com/stackhpc/azimuth-llm/tree/gh-pages).
22+
23+
### Customisation
24+
25+
The `chart/values.yaml` file documents the various customisation options which are available. In order to access the LLM from outside the Kubernetes cluster, the API and/or UI service types may be changed to
26+
```
27+
api:
28+
service:
29+
type: LoadBalancer
30+
ui:
31+
service:
32+
type: LoadBalancer
33+
```
34+
35+
***Warning*** - Exposing the services in this way provides no authentication mechanism and anyone with access to the load balancer IPs will be able to query the language model. In the Azimuth deployment case, authentication is provided via the standard Azimuth identity provider mechanisms and the authenticated services are exposed via [Zenith](https://github.com/stackhpc/zenith).
36+
37+
38+
## Tested Models
39+
40+
The following is a non-exhaustive list of models which have been tested with this app:
41+
- [Llama 2 7B chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
42+
- [AWQ Quantized Llama 2 70B](https://huggingface.co/TheBloke/Llama-2-70B-Chat-AWQ)
43+
- [Magicoder 6.7B](https://huggingface.co/ise-uiuc/Magicoder-S-DS-6.7B)
44+
45+
Due to the combination of [components](##Components) used in this app, some Huggingface models may not work as expected (usually due to the way in which LangChain formats the prompt messages). Any errors when using new model The logs for the API pod will usually provide more information on failed requests.
46+
47+
48+
## Components
49+
50+
*TO-DO*

0 commit comments

Comments
 (0)