Skip to content

Commit 8a4288c

Browse files
authored
Merge pull request #10 from jasonacox/v0.15.0
Document Management - v0.15.0
2 parents 58ec017 + 84267ee commit 8a4288c

18 files changed

+2393
-71
lines changed

Diff for: .gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -175,3 +175,4 @@ agents/r
175175
chatbot/tlocal
176176
chatbot/tt
177177
lab/t
178+
chatbot/uploads/?*.*

Diff for: RELEASE.md

+5
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# Releases
22

3+
## 0.15.0 - Document Manager
4+
5+
* Chatbot: Using Document class for RAG functions.
6+
* DocMan: New web based UI for managing documents in the Weaviate vector database. Allows user to upload and embed content from URLs and uploaded files. Provides optional chunking and management of embedded documents.
7+
38
## 0.14.13 - TPS Calculation
49

510
* Chatbot: Fix a bug that was counting null tokens.

Diff for: chatbot/Dockerfile

+3-1
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,13 @@ WORKDIR /app
2727

2828
# Install depencencies - Weaviate Vector Search
2929
RUN pip install fastapi uvicorn python-socketio jinja2 openai bs4 pypdf requests lxml aiohttp
30-
RUN pip install weaviate-client
30+
RUN pip install weaviate-client pdfreader pypandoc
3131

3232
# Copy local files into container
3333
COPY server.py /app/server.py
3434
COPY templates /app/templates
35+
COPY documents.py /app/documents.py
36+
COPY version.py /app/version.py
3537

3638
# Network
3739
EXPOSE $PORT

Diff for: chatbot/Dockerfile-docman

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Dockerfile for docman - TinyLLM Document Manager
2+
#
3+
# Author: Jason A. Cox
4+
# 22 Sept 2024
5+
# https://github.com/jasonacox/TinyLLM
6+
7+
# Use a base image
8+
FROM python:3.10-slim
9+
10+
# Setting build related env vars
11+
ENV MAX_CHUNK_SIZE 1024
12+
ENV UPLOAD_FOLDER uploads
13+
ENV HOST localhost
14+
ENV PORT 5001
15+
ENV COLLECTIONS_ADMIN true
16+
17+
# Set the working directory
18+
WORKDIR /app
19+
20+
# Install depencencies
21+
RUN pip install fastapi uvicorn jinja2 bs4 pypdf requests lxml aiohttp
22+
RUN pip install weaviate-client pdfreader pypandoc
23+
RUN pip install python-multipart
24+
25+
# Copy local files into container
26+
COPY docman.py /app/docman.py
27+
COPY documents.py /app/documents.py
28+
COPY version.py /app/version.py
29+
COPY docman /app/docman
30+
31+
# Network
32+
EXPOSE $PORT
33+
34+
# Run the server
35+
CMD uvicorn docman:app --host 0.0.0.0 --port $PORT

Diff for: chatbot/README.md

+39
Original file line numberDiff line numberDiff line change
@@ -84,3 +84,42 @@ The `/news` command will fetch the latest news and have the LLM summarize the to
8484
* A Hacker’s Guide to Language Models - Jeremy Howard [[link](https://www.youtube.com/watch?v=jkrNMKz9pWU&ab_channel=JeremyHoward)]
8585

8686
You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF. You carefully provide accurate, factual, thoughtful, nuanced answers, and are brilliant at reasoning. If you think there might not be a correct answer, you say so. Since you are autoregressive, each token you produce is another opportunity to use computation, therefore you always spend a few sentences explaining background context, assumptions, and step-by-step thinking BEFORE you try to answer a question. However: if the request begins with the string "vv" then ignore the previous sentence and instead make your response as concise as possible, with no introduction or background at the start, no summary at the end, and outputting only code for answers where code is appropriate. Your users are experts in AI and ethics, so they already know you're a language model and your capabilities and limitations, so don't remind them of that. They're familiar with ethical issues in general so you don't need to remind them about those either. Don't be verbose in your answers, but do provide details and examples where it might help the explanation. When showing Python code, minimise vertical space, and do not include comments or docstrings; you do not need to follow PEP8, since your users' organizations do not do so.
87+
88+
89+
90+
# TinyLLM Document Manager (Weaviate)
91+
92+
The document manager allows you to manage the collections and documents in the Weaviate vector database. It provides an easy way for you to upload and ingest the content from files or URL. It performs simple chunking (if requested). The simple UI let's you navigate through the collections and documents.
93+
94+
### Environment Variables
95+
96+
- MAX_CHUNK_SIZE: Maximum size of a chunk in bytes (default 1024)
97+
- UPLOAD_FOLDER: Folder where uploaded files are stored (default uploads)
98+
- HOST: Weaviate host (default localhost)
99+
- COLLECTIONS: Comma separated list of collections allowed (default all)
100+
- PORT: Port for the web server (default 8000)
101+
- COLLECTIONS_ADMIN: Allow users to create and delete collections (default True)
102+
103+
### Docker Setup
104+
105+
```bash
106+
docker run \
107+
-d \
108+
-p 5001:5001 \
109+
-e HOST="localhost" \
110+
-e PORT="5001" \
111+
-e MAX_CHUNK_SIZE="1024" \
112+
-e UPLOAD_FOLDER="uploads" \
113+
-e COLLECTIONS_ADMIN="true" \
114+
--name docman \
115+
--restart unless-stopped \
116+
jasonacox/docman
117+
```
118+
Note - You can restrict collections by providing the environmental variable `COLLECTIONS` to a string of comma separated collection names.
119+
120+
### Screenshots
121+
122+
<img width="1035" alt="image" src="https://github.com/user-attachments/assets/544c75d4-a1a3-4c32-a95f-7f12ff11a450">
123+
124+
<img width="1035" alt="image" src="https://github.com/user-attachments/assets/4b15ef87-8f25-4d29-9214-801a326b406f">
125+

0 commit comments

Comments
 (0)