You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+13-7Lines changed: 13 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -1,26 +1,30 @@
1
1
# Azimuth LLM
2
2
3
-
This repository contains a Helm chart for deploying Large Language Models (LLMs) on Kubernetes. It is developed primarily for use as a pre-packaged application within [Azimuth](https://www.stackhpc.com/azimuth-introduction.html) but is structured such that it can, in principle, be deployed on any Kubernetes cluster with at least 1 GPU node.
3
+
This repository contains a set of Helm charts for deploying Large Language Models (LLMs) on Kubernetes. It is developed primarily for use as a set of pre-packaged applications within [Azimuth](https://www.stackhpc.com/azimuth-introduction.html) but is structured such that the charts can, in principle, be deployed on any Kubernetes cluster with at least 1 GPU node.
4
4
5
5
## Azimuth App
6
6
7
-
This app is provided as part of a standard deployment Azimuth, so no specific steps are required to use this app other than access to an up-to-date Azimuth deployment.
7
+
This primary LLM chat app is provided as part of a standard deployment Azimuth, so no specific steps are required to use this app other than access to an up-to-date Azimuth deployment.
8
8
9
9
## Manual Deployment
10
10
11
-
Alternatively, to set up the Helm repository and manually install this chart on an existing Kubernetes cluster, run
11
+
Alternatively, to set up the Helm repository and manually install the LLM chat interface chart on an existing Kubernetes cluster, run
where `version` is the full name of the published version for the specified commit (e.g. `0.1.0-dev.0.main.125`). To see the latest published version, see [this page](https://github.com/stackhpc/azimuth-llm/tree/gh-pages).
19
+
This will install the latest stable [release](https://github.com/stackhpc/azimuth-llm/releases) of the application.
20
+
21
+
## Chart Structure
22
+
23
+
Under the charts directory, there is a base [azimuth-llm](./charts/azimuth-llm) Helm chart which uses vLLM to deploy models from Huggingface. The [azimuth-chat](charts/azimuth-chat) and [azimuth-image-analysis](charts/azimuth-image-analysis) are wrapper charts which add different Gradio web interfaces for interacting with the deployed LLM.
20
24
21
25
### Customisation
22
26
23
-
The `chart/values.yaml` file documents the various customisation options which are available. In order to access the LLM from outside the Kubernetes cluster, the API and/or UI service types may be changed to
27
+
The `charts/azimuth-llm/values.yaml` file documents the various customisation options which are available. In order to access the LLM from outside the Kubernetes cluster, the API and/or UI service types may be changed to
24
28
```
25
29
api:
26
30
service:
@@ -38,6 +42,8 @@ ui:
38
42
39
43
The both the web-based interface and the backend OpenAI-compatible vLLM API server can also optionally be exposed using [Kubernetes Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/). See the `ingress` section in `values.yml` for available config options.
40
44
45
+
When deploying the chat or image-analysis wrapper charts, all configuration options must be nested under the `azimuth-llm` heading ([example](https://github.com/stackhpc/azimuth-llm/blob/main/charts/azimuth-chat/values.yaml#L1)) due to the way that Helm passes values between [parent charts and sub-charts](https://helm.sh/docs/chart_template_guide/subcharts_and_globals/#overriding-values-from-a-parent-chart).
46
+
41
47
## Tested Models
42
48
43
49
The application uses [vLLM](https://docs.vllm.ai/en/latest/index.html) for model serving, therefore any of the vLLM [supported models](https://docs.vllm.ai/en/latest/models/supported_models.html) should work. Since vLLM pulls the model files directly from [HuggingFace](https://huggingface.co/models) it is likely that some other models will also be compatible with vLLM but mileage may vary between models and model architectures. If a model is incompatible with vLLM then the API pod will likely enter a `CrashLoopBackoff` state and any relevant error information will be found in the API pod logs. These logs can be viewed with
@@ -46,7 +52,7 @@ The application uses [vLLM](https://docs.vllm.ai/en/latest/index.html) for model
If you suspect that a given error is not caused by the upstream vLLM support and a problem with this Helm chart then please [open an issue](https://github.com/stackhpc/azimuth-llm/issues).
55
+
If you suspect that a given error is not caused by the upstream vLLM version and is instead a problem with this Helm chart then please [open an issue](https://github.com/stackhpc/azimuth-llm/issues).
0 commit comments