Skip to content

Commit 597b887

Browse files
authored
Merge pull request #14 from jasonacox/v0.15.15
Docker Compose Quickstart
2 parents 82b040b + 057a52c commit 597b887

File tree

8 files changed

+197
-23
lines changed

8 files changed

+197
-23
lines changed

RELEASE.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
# Releases
22

3+
## 0.15.15 - Docker Compose
4+
5+
* Quick Start using Docker compose for Chatbot.
6+
* Chatbot - Bug Fix: Remove token limit on response. The `MAXTOKENS` setting is used to prune content sent to LLM. If not set, no pruning will happen.
7+
* Chatbot - Added additional LiteLLM support with the environmental settings `LITELLM_PROXY` and `LITELLM_KEY`. If set, these will override the OpenAI API settings to use LiteLLM and will remove `EXTRA_BODY` defaults that conflict with LiteLLM.
8+
* LiteLLM - Added docker compose to start LiteLLM, PostgreSQL, and Chatbot.
9+
310
## 0.15.14 - Multi-model Support
411

512
* Chatbot - Add `/model` command to list available models and dynamically set models during the session.

chatbot/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,12 @@ FROM python:3.10-slim
1010

1111
# Setting build related env vars
1212
ENV PORT=5000
13-
ENV OPENAI_API_KEY="DEFAULT_API_KEY"
13+
ENV OPENAI_API_KEY="no-key"
1414
ENV OPENAI_API_BASE="http://localhost:8000/v1"
1515
ENV AGENT_NAME="Jarvis"
1616
ENV MY_MODEL="models/7B/gguf-model.bin"
1717
ENV DEBUG="false"
18-
ENV WEAVIATE_HOST="localhsot"
18+
ENV WEAVIATE_HOST="localhost"
1919
ENV WEAVIATE_PORT="8080"
2020
ENV WEAVIATE_GRPC_HOST="localhost"
2121
ENV WEAVIATE_GRPC_PORT="50051"

chatbot/README.md

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,26 @@ Below are steps to get the Chatbot and Document Manager running.
1212

1313
The Chatbot can be launched as a Docker container or via command line.
1414

15-
### Docker
15+
### Method 1: Docker Compose
16+
17+
A quickstart method is located in the [litellm](./litellm/) folder. This setup will launch the Chatbot + LiteLLM and PostgreSQL. This works on Mac and Linux (or WSL) systems.
18+
19+
```bash
20+
cd litellm
21+
22+
# Edit compose.yaml and config.yaml for your setup.
23+
nano compose.yaml
24+
nano config.yaml
25+
26+
# Launch
27+
docker compose up -d
28+
```
29+
30+
The containers will download and launch. The database will be set up in the `./db` folder.
31+
- The Chatbot will be available at http://localhost:5000
32+
- The LiteLLM usage dashboard will be available at http://localhost:4000/ui
33+
34+
### Method 2: Docker
1635

1736
```bash
1837
# Create placeholder prompts.json
@@ -85,7 +104,8 @@ docker run \
85104
-d \
86105
-p 5000:5000 \
87106
-e PORT=5000 \
88-
-e OPENAI_API_BASE="http://localhost:4000/v1" \
107+
-e LITELLM_PROXY="http://localhost:4000/v1" \
108+
-e LITELLM_KEY="sk-mykey" \
89109
-e LLM_MODEL="local-pixtral" \
90110
-e TZ="America/Los_Angeles" \
91111
-v $PWD/.tinyllm:/app/.tinyllm \
@@ -98,7 +118,7 @@ The Chatbot will try to use the specified model (`LLM_MODEL`) but if it is not a
98118

99119
View the chatbot at http://localhost:5000
100120

101-
#### Command Line Option
121+
### Method 3: Command Line
102122

103123
```bash
104124
# Install required packages

chatbot/litellm/README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# TinyLLM Chatbot with LiteLLM + PostgreSQL
2+
3+
This folder contains a docker-compose file that will start a TinyLLM Chatbot with a LiteLLM proxy and a PostgreSQL database. The Chatbot will connect to the LiteLLM proxy to access the models. The LiteLLM proxy will connect to the PostgreSQL database to store usage data.
4+
5+
## Instructions
6+
7+
1. Edit the config.yaml file to add your models and settings.
8+
2. Edit the compose.yaml file to adjust the environment variables in the services as needed.
9+
3. Run `docker compose up -d` to start the services.
10+
11+
The containers will download and launch. The database will be set up in the `./db` folder.
12+
13+
- The Chatbot will be available at http://localhost:5000
14+
- The LiteLLM proxy will be available at http://localhost:4000/ui
15+
- The PostgreSQL pgAdmin interface will be available at http://localhost:5050

chatbot/litellm/compose.yaml

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# TinyLLM Chatbot with LiteLLM + PostgreSQL
2+
#
3+
# This docker-compose file will start a TinyLLM Chatbot with a LiteLLM proxy
4+
# and a PostgreSQL database. The Chatbot will connect to the LiteLLM proxy
5+
# to access the models. The LiteLLM proxy will connect to the PostgreSQL
6+
# database to store usage data.
7+
#
8+
# Instructions:
9+
# 1. Edit the config.yaml file to add your models and settings.
10+
# 2. Edit the environment variables in the services section below as needed.
11+
# 3. Run `docker-compose up -d` to start the services.
12+
#
13+
# The Chatbot will be available at http://localhost:5000
14+
# The LiteLLM proxy will be available at http://localhost:4000/ui
15+
# The PostgreSQL pgAdmin interface will be available at http://localhost:5050
16+
#
17+
# https://github.com/jasonacox/TinyLLM
18+
19+
services:
20+
# PostgreSQL database setup - No changes needed
21+
postgres:
22+
container_name: container-pg
23+
image: postgres
24+
hostname: localhost
25+
ports:
26+
- "5432:5432"
27+
environment:
28+
POSTGRES_USER: litellm
29+
POSTGRES_PASSWORD: 3-laws-safe
30+
POSTGRES_DB: litellm
31+
volumes:
32+
- ./db:/var/lib/postgresql/data
33+
restart: unless-stopped
34+
35+
# pgAdmin interface for PostgreSQL - Edit login credentials as needed
36+
pgadmin:
37+
container_name: container-pgadmin
38+
image: dpage/pgadmin4
39+
depends_on:
40+
- postgres
41+
ports:
42+
- "5050:80"
43+
environment:
44+
PGADMIN_DEFAULT_EMAIL: [email protected]
45+
PGADMIN_DEFAULT_PASSWORD: 3-laws-safe
46+
restart: unless-stopped
47+
48+
# LiteLLM proxy service - Edit KEYs and LOCAL settings as needed
49+
litellm-proxy:
50+
image: ghcr.io/berriai/litellm:main-latest
51+
container_name: litellm-proxy
52+
ports:
53+
- "4000:4000"
54+
environment:
55+
- CUSTOM_AWS_ACCESS_KEY_ID=YourAWSAccessKeyID
56+
- CUSTOM_AWS_SECRET_ACCESS_KEY=YourAWSAccessKeyID
57+
- CUSTOM_AWS_REGION_NAME=us-east-1
58+
- OPENAI_API_KEY=YourOpenAIAPIKey
59+
- LITELLM_MASTER_KEY=sk-3-laws-safe
60+
- MASTER_KEY=sk-3-laws-safe
61+
- LOCAL_LLM_URL=http://localhost:8000/v1
62+
- LOCAL_LLM_KEY=sk-3-laws-safe
63+
- DATABASE_URL=postgresql://litellm:3-laws-safe@container-pg:5432/litellm
64+
volumes:
65+
- ./config.yaml:/app/config.yaml
66+
command: --config /app/config.yaml
67+
restart: unless-stopped
68+
69+
# Chatbot service - No changes needed
70+
chatbot:
71+
image: jasonacox/chatbot
72+
container_name: chatbot
73+
ports:
74+
- "5000:5000"
75+
environment:
76+
- PORT=5000
77+
- LITELLM_PROXY=http://litellm-proxy:4000/v1
78+
- LITELLM_KEY=sk-3-laws-safe
79+
- LLM_MODEL=local-pixtral
80+
- TZ=America/Los_Angeles
81+
volumes:
82+
- ./.tinyllm:/app/.tinyllm
83+
restart: unless-stopped
84+

chatbot/litellm/config.yaml

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# LiteLLM Model Definitions
2+
#
3+
# This config.yaml file defines the models and settings for the LiteLLM proxy.
4+
# See https://docs.litellm.ai/docs/providers for examples.
5+
#
6+
# https://github.com/jasonacox/TinyLLM
7+
8+
model_list:
9+
10+
# Local OpenAI Compatible API - e.g. vLLM
11+
- model_name: local-pixtral
12+
litellm_params:
13+
model: openai/mistralai/Pixtral-12B-2409
14+
api_base: os.environ/LOCAL_LLM_URL
15+
api_key: os.environ/LOCAL_LLM_KEY
16+
17+
# AWS Bedrock Model Examples
18+
- model_name: aws-titan
19+
litellm_params:
20+
model: bedrock/amazon.titan-text-premier-v1:0
21+
aws_access_key_id: os.environ/CUSTOM_AWS_ACCESS_KEY_ID
22+
aws_secret_access_key: os.environ/CUSTOM_AWS_SECRET_ACCESS_KEY
23+
aws_region_name: os.environ/CUSTOM_AWS_REGION_NAME
24+
- model_name: aws-mixtral
25+
litellm_params:
26+
model: bedrock/mistral.mixtral-8x7b-instruct-v0:1
27+
aws_access_key_id: os.environ/CUSTOM_AWS_ACCESS_KEY_ID
28+
aws_secret_access_key: os.environ/CUSTOM_AWS_SECRET_ACCESS_KEY
29+
aws_region_name: os.environ/CUSTOM_AWS_REGION_NAME
30+
31+
# OpenAI Model Example - GPT-3.5 Turbo
32+
- model_name: gpt-3.5-turbo
33+
litellm_params:
34+
model: openai/gpt-3.5-turbo
35+
api_key: os.environ/OPENAI_API_KEY
36+
37+
# Ollama Model Example
38+
- model_name: ollama-llama3.1
39+
litellm_params:
40+
model: ollama_chat/llama3.1
41+
api_base: http://ollama:11434
42+
43+
# General Settings for LiteLLM - no changes needed
44+
general_settings:
45+
master_key: sk-3-laws-safe
46+
database_url: "postgresql://litellm:3-laws-safe@container-pg:5432/litellm"

chatbot/server.py

Lines changed: 19 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -120,13 +120,15 @@ def debug(text):
120120
logger.debug(text)
121121

122122
# Configuration Settings
123-
API_KEY = os.environ.get("OPENAI_API_KEY", "open_api_key") # Required, use bogus string for Llama.cpp
123+
API_KEY = os.environ.get("OPENAI_API_KEY", "Asimov-3-Laws") # Required, use bogus string for local LLMs
124124
API_BASE = os.environ.get("OPENAI_API_BASE", "http://localhost:8000/v1") # Required, use https://api.openai.com for OpenAI
125+
LITELLM_PROXY = os.environ.get("LITELLM_PROXY", None) # Optional - LITELLM Proxy URL
126+
LITELLM_KEY = os.environ.get("LITELLM_KEY", "") # Optional - LITELLM Secret Key - Begins with sk-
125127
AGENTNAME = os.environ.get("AGENT_NAME", "") # Set the name of your bot
126128
MYMODEL = os.environ.get("LLM_MODEL", "models/7B/gguf-model.bin") # Pick model to use e.g. gpt-3.5-turbo for OpenAI
127129
DEBUG = os.environ.get("DEBUG", "false").lower() == "true" # Set to True to enable debug mode
128130
MAXCLIENTS = int(os.environ.get("MAXCLIENTS", 1000)) # Maximum number of concurrent clients
129-
MAXTOKENS = int(os.environ.get("MAXTOKENS", 16*1024)) # Maximum number of tokens to send to LLM
131+
MAXTOKENS = int(os.environ.get("MAXTOKENS", 0)) # Maximum number of tokens to send to LLM for RAG
130132
TEMPERATURE = float(os.environ.get("TEMPERATURE", 0.0)) # LLM temperature
131133
PORT = int(os.environ.get("PORT", 5000)) # Port to listen on
132134
PROMPT_FILE = os.environ.get("PROMPT_FILE", f".tinyllm/prompts.json") # File to store system prompts
@@ -147,12 +149,18 @@ def debug(text):
147149
log("EXTRA_BODY is not valid JSON")
148150
EXTRA_BODY = {}
149151
else:
150-
if API_BASE.startswith("https://api.openai.com"):
152+
if API_BASE.startswith("https://api.openai.com") or LITELLM_PROXY:
151153
EXTRA_BODY = {}
152154
else:
153155
# Extra stop tokens are needed for some non-OpenAI LLMs
154156
EXTRA_BODY = {"stop_token_ids":[128001, 128009]}
155157

158+
# LiteLLM Proxy
159+
if LITELLM_PROXY:
160+
log(f"Using LiteLLM Proxy at {LITELLM_PROXY}")
161+
API_BASE = LITELLM_PROXY
162+
API_KEY = LITELLM_KEY
163+
156164
# RAG Configuration Settings
157165
WEAVIATE_HOST = os.environ.get("WEAVIATE_HOST", "") # Empty = no Weaviate support
158166
WEAVIATE_GRPC_HOST = os.environ.get("WEAVIATE_GRPC_HOST", WEAVIATE_HOST) # Empty = no Weaviate gRPC support
@@ -262,10 +270,9 @@ def test_model():
262270
log("LLM: Switching to an available model: %s" % model_list[0])
263271
MYMODEL = model_list[0]
264272
# Test LLM
265-
log(f"LLM: Using and testing model {MYMODEL}")
273+
log(f"LLM: Using model: {MYMODEL}")
266274
llm.chat.completions.create(
267275
model=MYMODEL,
268-
max_tokens=MAXTOKENS,
269276
stream=False,
270277
temperature=TEMPERATURE,
271278
messages=[{"role": "user", "content": "Hello"}],
@@ -278,10 +285,6 @@ def test_model():
278285
except Exception as erro:
279286
log("OpenAI API Error: %s" % erro)
280287
log(f"Unable to connect to OpenAI API at {API_BASE} using model {MYMODEL}.")
281-
if "maximum context length" in str(erro):
282-
if MAXTOKENS > 1024:
283-
MAXTOKENS = int(MAXTOKENS / 2)
284-
log(f"LLM: Maximum context length exceeded reducing MAXTOKENS to {MAXTOKENS}.")
285288
return False
286289

287290
# Fetch list of LLM models
@@ -308,9 +311,9 @@ def get_models():
308311
if WEAVIATE_HOST != "":
309312
try:
310313
rag_documents.connect()
311-
log(f"Connected to Weaviate at {WEAVIATE_HOST}")
314+
log(f"RAG: Connected to Weaviate at {WEAVIATE_HOST}")
312315
except Exception as err:
313-
log(f"Unable to connect to Weaviate at {WEAVIATE_HOST} - {str(err)}")
316+
log(f"RAG: Unable to connect to Weaviate at {WEAVIATE_HOST} - {str(err)}")
314317
WEAVIATE_HOST = ""
315318
log("RAG support disabled.")
316319

@@ -335,7 +338,7 @@ def query_index(query, library, num_results=RESULTS):
335338
if ans['content'] == previous_content:
336339
continue
337340
new_content = ans['content']
338-
if len(new_content) > MAXTOKENS:
341+
if MAXTOKENS and len(new_content) > MAXTOKENS:
339342
debug("RAG: Content size exceeded maximum size using chunk.")
340343
# Cut the middle and insert the chunk in the middle
341344
new_content = ans['content'][:MAXTOKENS//4] + "..." + (ans.get('chunk') or " ") + "..." + ans['content'][-MAXTOKENS//4:]
@@ -475,7 +478,6 @@ async def ask(prompt, sid=None):
475478
llm_stream = openai.OpenAI(api_key=API_KEY, base_url=API_BASE)
476479
response = llm_stream.chat.completions.create(
477480
model=client[sid]["model"],
478-
max_tokens=MAXTOKENS,
479481
stream=True, # Send response chunks as LLM computes next tokens
480482
temperature=TEMPERATURE,
481483
messages=client[sid]["context"],
@@ -529,7 +531,6 @@ async def ask_llm(query, format="", model=MYMODEL):
529531
llm = openai.AsyncOpenAI(api_key=API_KEY, base_url=API_BASE)
530532
response = await llm.chat.completions.create(
531533
model=model,
532-
max_tokens=MAXTOKENS,
533534
stream=False,
534535
temperature=TEMPERATURE,
535536
messages=content,
@@ -548,7 +549,6 @@ async def ask_context(messages, model=MYMODEL):
548549
llm = openai.AsyncOpenAI(api_key=API_KEY, base_url=API_BASE)
549550
response = await llm.chat.completions.create(
550551
model=model,
551-
max_tokens=MAXTOKENS,
552552
stream=False,
553553
temperature=TEMPERATURE,
554554
messages=messages,
@@ -718,13 +718,15 @@ async def home(format: str = None):
718718
"LLM Main User Queries": stats["ask"],
719719
"LLM Helper Queries": stats["ask_llm"],
720720
"LLM CoT Context Queries": stats["ask_context"],
721+
"OpenAI API URL (OPENAI_API_URL)": API_BASE if not LITELLM_PROXY else "Disabled",
721722
"OpenAI API Key (OPENAI_API_KEY)": "************" if API_KEY != "" else "Not Set",
722-
"OpenAI API URL (OPENAI_API_URL)": API_BASE,
723+
"LiteLLM Proxy (LITELLM_PROXY)": LITELLM_PROXY or "Disabled",
724+
"LiteLLM Secret Key (LITELLM_KEY)": "************" if LITELLM_KEY != "" else "Not Set",
723725
"Agent Name (AGENT_NAME)": AGENTNAME,
724726
"LLM Model (LLM_MODEL)": MYMODEL,
725727
"Debug Mode (DEBUG)": DEBUG,
726728
"Current Clients (MAXCLIENTS)": f"{len(client)} of {MAXCLIENTS}",
727-
"LLM Max tokens Limit (MAXTOKENS)": MAXTOKENS,
729+
"LLM Max Tokens to Send (MAXTOKENS)": MAXTOKENS,
728730
"LLM Temperature (TEMPERATURE)": TEMPERATURE,
729731
"Server Port (PORT)": PORT,
730732
"Saved Prompts (PROMPT_FILE)": PROMPT_FILE,

chatbot/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
VERSION = "v0.15.14"
1+
VERSION = "v0.15.15"

0 commit comments

Comments
 (0)