Flask AI Starter

A starter application that shows a data collector architecture for retrieval augmented generation.

Technology stack

This codebase is written Python and uses Flask and Jinja2 Templates with the OpenAI API. It stores data in PostgreSQL and uses pgvector to write and query embeddings. A GitHub Action runs tests.

Architecture

The AI Starter consists of three free-running processes communicating with one Postgres database.

The data collector is a background process that collects data from one or more sources.
The data analyzer is another background process that processes collected data.
The web application collects a query from the user and displays a result to the user.

flowchart LR
    embeddings([OpenAI embeddings])
    user((User))
    app["Web App"]
    db[("PostgreSQL (+pgvector)")]
    llm([OpenAI completion])
    
    user -- query --> app
    app -- create embedding --> embeddings
    app -- search embeddings --> db
    app -- retrieve documents --> db
    app -- fetch text completion --> llm

    classDef node font-weight:bold,color:white,stroke:black,stroke-width:2px;
    classDef app fill:#3185FC;
    classDef db fill:#B744B8;
    classDef external fill:#FA9F42;
    classDef user fill:#ED6A5A;

    class app,collector,analyzer app;
    class db db;
    class docs,embeddings,llm external;
    class user user;

flowchart LR
    embeddings([OpenAI embeddings])
    docs(["RSS feeds"])
    db[("PostgreSQL (+pgvector)")]
    collector["Data Collector"]
    analyzer["Data Analyzer"]
    
    collector -- fetch documents --> docs
    collector -- save documents --> db
    analyzer -- retrieve documents --> db
    analyzer -- create embeddings --> embeddings
    analyzer -- "save embeddings (with reference)" --> db

    classDef node font-weight:bold,color:white,stroke:black,stroke-width:2px;
    classDef app fill:#3185FC;
    classDef db fill:#B744B8;
    class app,collector,analyzer app;
    classDef external fill:#FA9F42;
    classDef user fill:#ED6A5A;

    class db db;
    class docs,embeddings external;
    class user user;

Collection and Analysis

The data collector fetches documents from RSS feeds sources and stores the document text in the database. It also splits documents into chunks of less than 6000 tokens to ensure embedding and text completion calls stay below their token limits. The data analyzer sends document chunks to the OpenAI Embeddings API and uses pgvector to store the embeddings in PostgreSQL.

Web Application

The web application collects the user's query and creates an embedding with the OpenAI Embeddings API. It then searches the PostgreSQL for similar embeddings (using pgvector) and provides the corresponding chunk of text as context for a query to the OpenAI Chat Completion API.

Local development

Install uv, PostgreSQL 17, and pgvector.

brew install uv postgresql@17 pgvector
brew services run postgresql@17

Set up environment variables.
```
cp .env.example .env 
source .env
```

Set up the database.

psql postgres < databases/create_databases.sql
uv run alembic upgrade head
DATABASE_URL="postgresql://localhost:5432/ai_starter_test?user=ai_starter&password=ai_starter" uv run alembic upgrade head

Run tests.
```
uv run -m unittest
```
Run the collector and the analyzer to populate the database, then run the app and navigate to localhost:5001.
```
uv run -m starter.collect
uv run -m starter.analyze
uv run -m starter
```

Build container

uv pip compile pyproject.toml -o requirements.txt
docker build -t flask-ai-starter .

Run with docker

docker run --env-file .env.docker flask-ai-starter  ./collect.sh
docker run --env-file .env.docker flask-ai-starter  ./analyze.sh
docker run -p 8081:8081 --env-file .env.docker flask-ai-starter

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
bin		bin
databases		databases
starter		starter
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
alembic.ini		alembic.ini
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Flask AI Starter

Technology stack

Architecture

Collection and Analysis

Web Application

Local development

Build container

About

Uh oh!

Releases

Packages

Uh oh!

Languages

initialcapacity/flask-ai-starter

Folders and files

Latest commit

History

Repository files navigation

Flask AI Starter

Technology stack

Architecture

Collection and Analysis

Web Application

Local development

Build container

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages