A Python project demonstrating ChromaDB integration for vector similarity search and document storage, with support for loading and querying Markdown files.
- Load and index Markdown documents from a directory
- Support for both persistent and in-memory database modes
- Multiple client setup options (local, server-based, ephemeral)
- Document search with metadata support
- Automatic file opening for search results
- Development mode support
- Python 3.10 or higher
- Poetry for dependency management
- Clone this repository
- Install dependencies using Poetry:
poetry install
.
├── src/
│ ├── __init__.py
│ ├── main.py # Main application entry point
│ ├── chroma_setups.py # ChromaDB client configuration
│ ├── data_loader.py # Document loading utilities
│ ├── utils.py # General utility functions
│ └── config.py # Application configuration
├── input_data/ # Default directory for markdown files
├── poetry.lock
└── pyproject.toml
Run the example application:
poetry run python src/main.py
Set the DEV_MODE
environment variable to enable development features:
DEV_MODE=true poetry run python src/main.py
The project supports three types of ChromaDB clients:
- Persistent Client: Default mode, stores data on disk
- Ephemeral Client: In-memory storage, useful for testing
- Server Client: Connects to a remote ChromaDB server
- Default input directory:
input_data/
- ChromaDB server URL:
http://localhost:8000
(for server mode) - File types supported: Markdown (
.md
)