Sample agentic AI projects & reference implementations β Retrieval-Augmented Generation (RAG), agents, evaluation pipelines, and more.
# Download QA data in txt format under some path eg. data/fantastic_charge
# Save in a file named <filename>.txt.
Disclaimer: The sample data in this repo are generated automatically by an LLM for a fictional product and meant to have no relation with an existing product. Any resemblance is purely coincidental.
# Install Ollama and pull the models
ollama pull llama3:8b-instruct-q4_0
olllama pull BGE-M3:latest
# start opik as shown in the next section
uv sync --dev
# Ingest txt Q&A data in Chroma DB
uv run src/data/rag_ingest.py ./data/products /tmp/ch_db
# Run the Chat Streamlit app
uv run python -m streamlit run ./src/chatbot/app.py
# or
source .venv/bin/activate
python -m streamlit run ./src/chatbot/app.py
Follow the instructions in https://www.comet.com/docs/opik/self-host/local_deployment. The chatbot app is configured to bypass opik url input and work with opik running locally.
Access the app at: http://localhost:8501
Under deployment you can find several images on how to run the LLMs used in this project. You can also use the vLLM setup for a production on premise setup.
To build the images:
docker build -t skonto/ollama:qa -f Dockerfile.ollama .
docker build --no-cache --progress=plain --secret "id=guard,src=$HOME/.guardrailsrc" . -t skonto/qa
To run the images:
docker run --gpus all -p8080:11434 skonto/ollama:qa
docker run -it --gpus all -e OLLAMA_HOST=localhost:8080 --net=host skonto/qa
The system consists of:
- Document ingestion and embedding
- RAG pipeline using Ollama
- Chatbot (CLI/HTTP interface)
- Evaluation using RAGAS and custom metrics
The final goal is to have a customizable architecture on top of the following rough diagram of the RAG pipeline:
RAG (Retrieval-Augmented Generation) Pipeline
==============================================
βββββββββββββββββββ ββββββββββββββββββββ
β User Metadata β β User Query β
β (context, β β β
β preferences, β β β
β permissions) β β β
βββββββββββ¬ββββββββ βββββββββββ¬βββββββββ
β β
β βββββββββΌββββββββββ
β β Query Sanitizer β
β β & Guardrails β
β β (input filter) β
β βββββββββ¬ββββββββββ
β β
β βΌ
β βββββββββββββββββββββββββ
ββββββββββββββββ€ RETRIEVER β
β β
β βββββββββββββββββββββββ€
β β Search Strategy: β
β β β’ Hybrid Search β
β β β’ BM25 (keyword) β
β β β’ Semantic (vector)β
β βββββββββββββββββββββββ€
βββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β Retrieved Docs β
β (initial results) β
βββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β RE-RANKER β
β (relevance scoring β
β & result ordering) β
βββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β Top-K Docs β
β (best matches) β
βββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β Context Enhancement β
β (combine query + β
β retrieved docs + β
β metadata context) β
βββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β LLM β
β β
β Config: β
β β’ Temperature: 0.7 β
β β’ Max tokens: 2048 β
β β’ System prompt β
β β’ Model: GPT-4 β
β β
β Output Format: β
β βββββββββββ¬ββββββββββ β
β βStreamingβ Chat β β
β β Format β Format β β
β βββββββββββ΄ββββββββββ β
βββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β Sampling Techniques β
β β
β β’ Top-p (nucleus) β
β β’ Top-k filtering β
β β’ Temperature scaling β
β β’ Repetition penalty β
β β’ Beam search β
β β’ Greedy decoding β
βββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β Output Guardrails β
β (safety filter, β
β content validation, β
β bias detection) β
βββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β Final Response β
β (to user) β
βββββββββββββββββββββββββ
Data Flow Legend:
βββββββββββββββββ
β = Processing step/component
βΌ = Data flow direction
ββ = Component boundary
β€ = Internal component section
and then move to an Agentic RAG pipeline as follows:
Agentic RAG Pipeline with Memory, Routing, Tools & Planning
============================================================
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β User Metadata β β User Query β β Conversation β
β (context, β β β β Memory β
β preferences, β β β β (short & long β
β permissions) β β β β term history) β
βββββββββββ¬ββββββββ βββββββββββ¬βββββββββ βββββββββββ¬ββββββββ
β β β
β βββββββββΌβββββββββ β
β β Query Sanitizerβ β
β β & Guardrails β β
β β (input filter) β β
β βββββββββ¬βββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββ΄βββ
β β PLANNER β
β β β
β β β’ Task decomposition β
β β β’ Goal identification β
β β β’ Multi-step strategy β
β β β’ Resource allocation β
β βββββββββββββββ¬βββββββββββββββββββββββ
β β
β βΌ
β βββββββββββββββββββββββββββββββββββββββ
β β LLM ROUTER β
β β β
β βββββββββββΌβββ Route Decision: ββββββββββββββββββ€
β β β β
β β β βββββββββββ¬ββββββββββ¬ββββββββββββββ β
β β β βRetrievalβ Tool β Direct β β
β β β β Path β Calling β Response β β
β β β βββββββββββ΄ββββββββββ΄ββββββββββββββ β
β β βββββββ¬ββββββββ¬ββββββββββββββββ¬ββββββββ
β β β β β
β β ββββββββββββ β β
β β β β β
β β βΌ βΌ βΌ
β β βββββββββββββββββββ βββββββββββββββββββββββββ β
β β β RETRIEVER β β TOOL CALLING β β
β β β β β β β
ββββββΌββ€ Search Strategy:β β Available Tools: β β
β β β’ Hybrid Search β β β’ Web search β β
β β β’ BM25 (keyword)β β β’ Calculator β β
β β β’ Semantic β β β’ Code interpreter β β
β β β’ Memory-guided β β β’ Database queries β β
β βββββββββββ¬ββββββββ β β’ API calls β β
β β β β’ File operations β β
β βΌ βββββββββββββ¬ββββββββββββ β
β βββββββββββββββββββ β β
β β Retrieved Docs β β β
β β (initial resultsβ β β
β β + memory refs) β β β
β βββββββββββ¬ββββββββ β β
β β β β
β βΌ β β
β βββββββββββββββββββ β β
β β RE-RANKER β β β
β β(relevance + β β β
β β memory context) β β β
β βββββββββββ¬ββββββββ β β
β β β β
β βΌ β β
β βββββββββββββββββββ β β
β β Top-K Docs β β β
β β (best matches) β β β
β βββββββββββ¬ββββββββ β β
β β β β
β βββββββββββββββββββββββΌββββββββββββββ
β β
β βΌ
β βββββββββββββββββββββββββββ
β β Context Enhancement β
β β β
β β Combine: β
β β β’ Query + retrieved docsβ
β β β’ Conversation memory β
β β β’ Tool results β
β β β’ User metadata β
β β β’ Planning context β
β βββββββββββββββ¬ββββββββββββ
β β
β βΌ
β βββββββββββββββββββββββββββ
β β LLM β
β β β
β β Config: β
β β β’ Temperature: 0.7 β
β β β’ Max tokens: 4096 β
β β β’ System prompt β
β β β’ Model: GPT-4 β
β β β’ Function calling β
β β β
β β Output Format: β
β β βββββββββββ¬ββββββββββββ β
β β βStreamingβ Chat β β
β β β Format β Format β β
β β βββββββββββ΄ββββββββββββ β
β βββββββββββββββ¬ββββββββββββ
β β
β βΌ
β βββββββββββββββββββββββββββ
β β Sampling Techniques β
β β β
β β β’ Top-p (nucleus) β
β β β’ Top-k filtering β
β β β’ Temperature scaling β
β β β’ Repetition penalty β
β β β’ Beam search β
β β β’ Greedy decoding β
β βββββββββββββββ¬ββββββββββββ
β β
β βΌ
β βββββββββββββββββββββββββββ
β β Output Guardrails β
β β β
β β Streaming Mode: β
β β β’ Token-level filter β
β β β’ Sliding window β
β β β’ Circuit breaker β
β β β’ Real-time scoring β
β β β
β β Batch Mode: β
β β β’ Full content scan β
β β β’ Complete validation β
β β β’ Comprehensive bias β
β β detection β
β βββββββββββββββ¬ββββββββββββ
β β
β βΌ
β βββββββββββββββββββββββββββββββββββ
β β MEMORY UPDATE β
β β β
β β β’ Store conversation turn β
β β β’ Update user preferences β
β β β’ Cache retrieved documents β
β β β’ Log tool usage patterns β
β β β’ Update planning context β
β βββββββββββββββ¬ββββββββββββββββββββ
β β
β βββββββββββββββΌββββββββββββββββββββ
β β Final Response β
β β β
β β βββββββββββ¬ββββββββββββββββββββββ€
β β βStreamingβ Batch Response β
β β βDelivery β (with memory β
β β β β updates complete) β
β β βββββββββββ΄ββββββββββββββββββββββ€
β βββββββββββββββββββββββββββββββββββ
β β
ββββββββββββββββββββββββββββββ
Data Flow Legend:
βββββββββββββββββ
β = Processing step/component
βΌ = Data flow direction
ββ = Component boundary
β€ = Internal component section
ββ = Feedback/memory loop
src/data/: Document ingestion & indexingsrc/rag/: RAG pipelines using Ollamasrc/chatbot/: CLI/HTTP chatbot interfacessrc/test/: Unit + integration tests and benchmarksMakefile,uv: commands for lint, test, format, eval
Evaluate generated answers using RAGAS:
uv run pytest -m integrationMetrics include:
LLMContextRecallFactualCorrectnessBleuScoreResponseRelevancy
Contributions welcome!
- Fork and clone
- Run
uv sync --dev - Lint:
make lint - Type-check:
make type-check - Test:
pytest
This repository is licensed under the Apache 2 License β see LICENSE for full details.
