Skip to content

Commit d8dd1f6

Browse files
authored
Merge pull request #3 from stackhpc/feat/langchain
Re-write frontend using LangChain + other QoL improvements
2 parents 1569398 + 71a0666 commit d8dd1f6

File tree

14 files changed

+291
-112
lines changed

14 files changed

+291
-112
lines changed

.gitignore

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
11
**/kubeconfig.y[a]ml
22
*kubeconfig*.y[a]ml
3-
venv/
43
.vscode/
54
__pycache__/
65
**/*.secret
76

87
# Ignore local dev helpers
98
test-values.y[a]ml
109
chart/web-app/settings.yml
11-
gradio-client-test.py
10+
gradio-client-test.py
11+
venv*/
12+
13+
# Helm chart stuff
14+
chart/Chart.lock
15+
chart/charts

chart/Chart.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,9 @@ appVersion: "1.16.0"
2626
icon: https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo.svg
2727

2828
annotations:
29-
azimuth.stackhpc.com/label: HuggingFace LLM
29+
azimuth.stackhpc.com/label: HuggingFace LLM
30+
31+
dependencies:
32+
- name: reloader
33+
version: 1.0.63
34+
repository: https://stakater.github.io/stakater-charts

chart/azimuth-ui.schema.yaml

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,23 @@
11
controls:
22
/huggingface/model:
33
type: TextControl
4-
placeholder: tiiuae/falcon-7b-instruct
4+
/huggingface/token:
5+
type: TextControl
6+
secret: true
7+
/ui/appSettings/model_instruction:
8+
type: TextControl
9+
/ui/appSettings/llm_max_tokens:
10+
type: NumberControl
11+
/ui/appSettings/llm_temperature:
12+
type: NumberControl
13+
/ui/appSettings/llm_top_p:
14+
type: NumberControl
15+
/ui/appSettings/llm_frequency_penalty:
16+
type: NumberControl
17+
/ui/appSettings/llm_presence_penalty:
18+
type: NumberControl
19+
# Use mirror to mimic yaml anchor in base Helm chart
20+
/ui/appSettings/model_name:
21+
type: MirrorControl
22+
path: /huggingface/model
23+
visuallyHidden: true

chart/templates/NOTES.txt

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,12 @@
1-
The LLM app allows users to deploy machine learning models using [vLLM](https://docs.vllm.ai/en/latest/) as a model serving backend and [gradio](https://github.com/gradio-app/gradio) as a web interface.
1+
The LLM chatbot app allows users to deploy machine learning models from [Huggingface](https://huggingface.co/models) and interact with them through a simple web interface.
2+
3+
Note: The target Kubernetes cluster must have a GPU worker node group configured, otherwise the app will remain in an 'Installing' state until a GPU node becomes available for scheduling.
4+
5+
On deployment of a new model, the app must first download the model's weights from Huggingface.
6+
This can take a significant amount of time depending on model choice and network speeds.
7+
Download progress can be monitored by inspecting the logs for the LLM API pod(s) via the Kubernetes Dashboard for the target cluster.
8+
9+
The app use [vLLM](https://docs.vllm.ai/en/latest/) as a model serving backend and [gradio](https://github.com/gradio-app/gradio) + [LangChain](https://python.langchain.com/docs/get_started/introduction) to provide the web interface.
10+
The official list of Huggingface models supported by vLLM can be found [here](https://docs.vllm.ai/en/latest/models/supported_models.html), though some of these may not be compatible with the LangChain prompt format.
11+
See [this documentation](https://github.com/stackhpc/azimuth-llm/) for a non-exhaustive list of languange models against which the app has been tested.
12+

chart/templates/api/deployment.yml

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,7 @@ spec:
2525
volumeMounts:
2626
- name: data
2727
mountPath: /root/.cache/huggingface
28-
command:
29-
- python3
3028
args:
31-
- -m
32-
- vllm.entrypoints.api_server
3329
- --model
3430
- {{ .Values.huggingface.model }}
3531
{{- if .Values.api.extraArgs -}}
@@ -47,15 +43,14 @@ spec:
4743
{{- fail "Either secretName or token value must be set for Llama and other gated models" }}
4844
{{- end }}
4945
readinessProbe:
50-
tcpSocket:
46+
httpGet:
5147
port: 8000
52-
initialDelaySeconds: 15
53-
periodSeconds: 10
48+
path: /health
49+
periodSeconds: 60
5450
resources:
5551
limits:
5652
nvidia.com/gpu: {{ .Values.api.gpus | int }}
5753
volumes:
58-
# TODO: Make this configurable (e.g. hostPath or PV)
5954
- name: data
6055
{{- .Values.api.cacheVolume | toYaml | nindent 10 }}
6156
# Suggested in vLLM docs

chart/templates/api/zenith-reservation.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ apiVersion: zenith.stackhpc.com/v1alpha1
33
kind: Reservation
44
metadata:
55
name: {{ .Release.Name }}-api
6+
labels:
7+
{{- include "azimuth-llm.labels" . | nindent 4 }}
68
annotations:
79
azimuth.stackhpc.com/service-label: {{ quote .Values.api.service.zenith.label }}
810
azimuth.stackhpc.com/service-icon-url: {{ .Values.api.service.zenith.iconUrl }}

chart/templates/ui/deployment.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@ metadata:
44
name: {{ .Release.Name }}-ui
55
labels:
66
{{- include "azimuth-llm.labels" . | nindent 4 }}
7+
annotations:
8+
# Make sure UI is reloaded when app settings are updated
9+
reloader.stakater.com/auto: "true"
710
spec:
811
replicas: 1
912
selector:

chart/values.schema.json

Lines changed: 48 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,7 @@
88
"model": {
99
"type": "string",
1010
"title": "Model",
11-
"description": "The HuggingFace model to deploy.",
12-
"default": "tiiuae/falcon-7b-instruct"
11+
"description": "The HuggingFace model to deploy (Hint: For a simple, lightweight demo try ise-uiuc/Magicoder-S-DS-6.7B)"
1312
},
1413
"token": {
1514
"type": "string",
@@ -19,6 +18,53 @@
1918
}
2019
},
2120
"required": ["model"]
21+
},
22+
"ui": {
23+
"type": "object",
24+
"properties": {
25+
"appSettings": {
26+
"type": "object",
27+
"properties": {
28+
"model_name": {
29+
"type": "string",
30+
"title": "Model Name",
31+
"description": "Model name supplied to OpenAI client in frontend web app. Should match huggingface.model above."
32+
},
33+
"model_instruction": {
34+
"type": "string",
35+
"title": "Model instruction",
36+
"description": "The initial model prompt (i.e. the hidden instructions) to use when generating responses."
37+
},
38+
"llm_max_tokens": {
39+
"type": "number",
40+
"title": "LLM temperature",
41+
"description": "The maximum number of new [tokens](https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_tokens) to generate for each LLM responses."
42+
},
43+
"llm_temperature": {
44+
"type": "number",
45+
"title": "LLM temperature",
46+
"description": "The '[temperature](https://platform.openai.com/docs/api-reference/chat/create#chat-create-temperature)' value to use when generating LLM responses."
47+
},
48+
"llm_top_p": {
49+
"type": "number",
50+
"title": "LLM Top P",
51+
"description": "The [top p](https://platform.openai.com/docs/api-reference/chat/create#chat-create-top_p) value to use when generating LLM responses."
52+
},
53+
"llm_presence_penalty": {
54+
"type": "number",
55+
"title": "LLM Presence Penalty",
56+
"description": "The [presence penalty](https://platform.openai.com/docs/api-reference/chat/create#chat-create-presence_penalty) to use when generating LLM responses."
57+
},
58+
"llm_frequency_penalty": {
59+
"type": "number",
60+
"title": "LLM Frequency Penalty",
61+
"description": "The [frequency_penalty](https://platform.openai.com/docs/api-reference/chat/create#chat-create-frequency_penalty) to use when generating LLM responses."
62+
}
63+
64+
},
65+
"required": ["model_name"]
66+
}
67+
}
2268
}
2369
},
2470
"required": ["huggingface"]

chart/values.yaml

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,8 @@
44

55
huggingface:
66
# The name of the HuggingFace model to use
7-
model: tiiuae/falcon-7b-instruct
8-
# Other (partially tested) options:
9-
# (some of which may not fit on a single GPU and will take a long time to download)
10-
# - meta-llama/Llama-2-7b-chat-hf # Requires licence token
11-
# - tiiuae/falcon-40b # Weights ~160GB disk size
12-
# - bigscience/bloom # Weights were trending towards ~360GB disk size
7+
# Use a yaml anchor to avoid duplication elsewhere
8+
model: &model-name ise-uiuc/Magicoder-S-DS-6.7B
139

1410
# For private/gated huggingface models (e.g. Meta's Llama models)
1511
# you must provide your own huggingface token, for details see:
@@ -30,7 +26,7 @@ api:
3026
# Container image config
3127
image:
3228
repository: vllm/vllm-openai
33-
version: v0.2.4
29+
version: v0.2.7
3430
# Service config
3531
service:
3632
name: llm-backend
@@ -39,7 +35,7 @@ api:
3935
enabled: false
4036
skipAuth: false
4137
label: Inference API
42-
iconUrl:
38+
iconUrl: https://raw.githubusercontent.com/vllm-project/vllm/v0.2.7/docs/source/assets/logos/vllm-logo-only-light.png
4339
description: |
4440
The raw inference API endpoints for the deployed LLM.
4541
# Config for huggingface model cache volume
@@ -70,24 +66,28 @@ ui:
7066
# The values to be written to settings.yml for parsing as frontend app setting
7167
# (see example_app.py and config.py for example using pydantic-settings to configure app)
7268
appSettings:
73-
prompt_template: ""
69+
model_name: *model-name
70+
model_instruction: "You are a helpful AI assistant. Please response appropriately."
7471
# Container image config
7572
image:
7673
repository: ghcr.io/stackhpc/azimuth-llm-ui-base
77-
version: de4324c
74+
version: "984c499"
7875
# Service config
7976
service:
8077
name: web-app
8178
type: ClusterIP
8279
zenith:
8380
enabled: true
8481
skipAuth: false
85-
label: Web Interface
82+
label: Chat Interface
8683
iconUrl: https://raw.githubusercontent.com/gradio-app/gradio/5524e590577769b0444a5332b8d444aafb0c5c12/js/app/public/static/img/logo.svg
8784
description: |
8885
A web-based user inferface for interacting with the deployed LLM.
8986
# The update strategy to use for the deployment
9087
updateStrategy:
9188
rollingUpdate:
9289
maxSurge: 25%
93-
maxUnavailable: 25%
90+
maxUnavailable: 25%
91+
92+
reloader:
93+
watchGlobally: false

chart/web-app/api_startup_check.py

Lines changed: 0 additions & 19 deletions
This file was deleted.

0 commit comments

Comments
 (0)