-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
sd109
committed
Jan 25, 2024
1 parent
4f69c25
commit 71dcb8f
Showing
1 changed file
with
48 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,50 @@ | ||
# Azimuth LLM | ||
|
||
A Helm chart for deploying LLMs on Azimuth. | ||
This repository contains a Helm chart for deploying Large Language Models (LLMs) on Kubernetes. It is developed primarily for use as a pre-packaged application within [Azimuth](https://www.stackhpc.com/azimuth-introduction.html) but is structured such that it can, in principle, be deployed on any Kubernetes cluster with at least 1 GPU node. | ||
|
||
# Deployment | ||
|
||
## Azimuth | ||
|
||
This app ~~is~~ will soon be provided as part of a standard deployment Azimuth so no specific steps are required to use this app other than access to an up to date Azimuth deployment. | ||
|
||
## Manual Installation | ||
|
||
To set up the Helm repository and manually install this chart on an arbitrary Kubernetes cluster, run | ||
|
||
``` | ||
helm repo add <chosen-repo-name> https://stackhpc.github.io/azimuth-llm/ | ||
helm repo update | ||
helm install <installation-name> <chosen-repo-name>/azimuth-llm --version <version> | ||
``` | ||
|
||
where version is the full published version for the specified commit (e.g. `0.1.0-dev.0.main.125`). To see the latest published version, see [here](https://github.com/stackhpc/azimuth-llm/tree/gh-pages). | ||
|
||
### Customisation | ||
|
||
The `chart/values.yaml` file documents the various customisation options which are available. In order to access the LLM from outside the Kubernetes cluster, the API and/or UI service types may be changed to | ||
``` | ||
api: | ||
service: | ||
type: LoadBalancer | ||
ui: | ||
service: | ||
type: LoadBalancer | ||
``` | ||
|
||
***Warning*** - Exposing the services in this way provides no authentication mechanism and anyone with access to the load balancer IPs will be able to query the language model. In the Azimuth deployment case, authentication is provided via the standard Azimuth identity provider mechanisms and the authenticated services are exposed via [Zenith](https://github.com/stackhpc/zenith). | ||
|
||
|
||
## Tested Models | ||
|
||
The following is a non-exhaustive list of models which have been tested with this app: | ||
- [Llama 2 7B chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) | ||
- [AWQ Quantized Llama 2 70B](https://huggingface.co/TheBloke/Llama-2-70B-Chat-AWQ) | ||
- [Magicoder 6.7B](https://huggingface.co/ise-uiuc/Magicoder-S-DS-6.7B) | ||
|
||
Due to the combination of [components](##Components) used in this app, some Huggingface models may not work as expected (usually due to the way in which LangChain formats the prompt messages). Any errors when using new model The logs for the API pod will usually provide more information on failed requests. | ||
|
||
|
||
## Components | ||
|
||
*TO-DO* |