High-performance Retrieval-Augmented Generation (RAG) on Redis, Qdrant or PostgreSQL (pgvector)
π FastAPI β’ Redis / Qdrant / PostgreSQL β’ Async β’ Embedding-agnostic
- Features
- Tech Stack
- Requirements
- Installation
- Configuration & Connection Options
- Usage
- Architecture
- License
- π High Performance: Vector search powered by Redis HNSW, Qdrant, or PostgreSQL with pgvector.
- π οΈ Simple API: Endpoints for index creation, insertion, querying, and optional re-ranking.
- π Embedding-agnostic: Works with any embedding model (OpenAI, Llama 3, HuggingFace, etc.).
- π» Interactive Setup Wizard:
aquiles-rag configswalks you through full configuration for Redis, Qdrant, or PostgreSQL. - β‘ Sync & Async clients:
AquilesRAG(requests) andAsyncAquilesRAG(httpx) withembedding_modelandmetadatasupport. - π§© Extensible: Designed to integrate into ML pipelines, microservices, or serverless deployments; supports an optional re-ranker stage for improved result ordering.
- Python 3.9+
- FastAPI
- Redis, Qdrant or PostgreSQL + pgvector as vector store
- NumPy
- Pydantic
- Jinja2
- Click (CLI)
- Requests (sync client)
- HTTPX (async client)
- Platformdirs (config management)
- Redis (standalone or cluster) β or Qdrant (HTTP / gRPC) β or PostgreSQL with the
pgvectorextension. - Python 3.9+
- pip
Optional: run Redis locally with Docker:
docker run -d --name redis-stack -p 6379:6379 redis/redis-stack-server:latest
pip install aquiles-raggit clone https://github.com/Aquiles-ai/Aquiles-RAG.git
cd Aquiles-RAG
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# optional development install
pip install -e .Configuration is persisted at:
~/.local/share/aquiles/aquiles_config.json
The previous manual per-flag config flow was replaced by an interactive wizard. Run:
aquiles-rag configsThe wizard prompts for everything required for either Redis, Qdrant, or PostgreSQL (host, ports, TLS/gRPC options, API keys, admin user). At the end it writes aquiles_config.json to the standard location.
The wizard also includes optional re-ranker configuration (enable/disable, execution provider, model name, concurrency, preload) so you can activate a re-ranking stage that scores
(query, doc)pairs after the vector store returns candidates.
If you prefer automation, generate the same JSON schema the wizard writes and place it at ~/.local/share/aquiles/aquiles_config.json before starting the server (or use the deploy pattern described below).
Aquiles-RAG supports multiple Redis modes:
- Local Cluster
RedisCluster(host=host, port=port, decode_responses=True)- Standalone Local
redis.Redis(host=host, port=port, decode_responses=True)- Remote with TLS/SSL
redis.Redis(host=host, port=port, username=username or None,
password=password or None, ssl=True, decode_responses=True,
ssl_certfile=ssl_certfile, ssl_keyfile=ssl_keyfile, ssl_ca_certs=ssl_ca_certs)- Remote without TLS/SSL
redis.Redis(host=host, port=port, username=username or None, password=password or None, decode_responses=True)If you select PostgreSQL in the wizard, the wizard will prompt for connection and pool settings for your Postgres instance. Note: Aquiles-RAG does not run DB migrations automatically β if you use Postgres you must prepare the
pgvectorandpgcryptoextension, tables and indexes yourself.
- Interactive Setup Wizard (recommended):
aquiles-rag configs- Serve the API:
aquiles-rag serve --host "0.0.0.0" --port 5500- Deploy with bootstrap script (pattern:
deploy_*.pywithrun()that callsgen_configs_file()):
# Redis example
aquiles-rag deploy --host "0.0.0.0" --port 5500 --workers 2 deploy_redis.py
# Qdrant example
aquiles-rag deploy --host "0.0.0.0" --port 5500 --workers 2 deploy_qdrant.py
# PostgreSQL example
aquiles-rag deploy --host "0.0.0.0" --port 5500 --workers 2 deploy_postgres.pyThe
deploycommand imports the given Python file, executes itsrun()to generate the config (writesaquiles_config.json), then starts the FastAPI server.
- Create Index
curl -X POST http://localhost:5500/create/index \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"indexname": "documents",
"embeddings_dim": 768,
"dtype": "FLOAT32",
"delete_the_index_if_it_exists": false
}'- Insert Chunk (ingest)
curl -X POST http://localhost:5500/rag/create \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"index": "documents",
"name_chunk": "doc1_part1",
"dtype": "FLOAT32",
"chunk_size": 1024,
"raw_text": "Text of the chunk...",
"embeddings": [0.12, 0.34, 0.56, ...]
}'- Query Top-K
curl -X POST http://localhost:5500/rag/query-rag \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"index": "documents",
"embeddings": [0.78, 0.90, ...],
"dtype": "FLOAT32",
"top_k": 5,
"cosine_distance_threshold": 0.6
}'The API supports an optional re-ranking stage (configurable in the server). When enabled, the typical flow is: vector search β candidate filtering/metadata match β optional re-ranker scores pairs to improve ordering. (See configuration wizard to enable/disable and set re-ranker options.)
from aquiles.client import AquilesRAG
client = AquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")
# Create an index (returns server text)
resp_text = client.create_index("documents", embeddings_dim=768, dtype="FLOAT32")
# Insert chunks using your embedding function
def get_embedding(text):
return embedding_model.encode(text)
responses = client.send_rag(
embedding_func=get_embedding,
index="documents",
name_chunk="doc1",
raw_text=full_text,
embedding_model="text-embedding-v1" # optional metadata sent with each chunk
)
# Query the index (returns parsed JSON)
results = client.query("documents", query_embedding, top_k=5)
print(results)import asyncio
from aquiles.client import AsyncAquilesRAG
client = AsyncAquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")
async def main():
await client.create_index("documents_async")
responses = await client.send_rag(
embedding_func=async_embedding_func, # supports sync or async callables
index="documents_async",
name_chunk="doc_async",
raw_text=full_text
)
results = await client.query("documents_async", query_embedding)
print(results)
asyncio.run(main())Notes
- Both clients accept an optional
embedding_modelparameter forwarded as metadata β helpful when storing/querying embeddings produced by different models. send_ragchunks text usingchunk_text_by_words()(default β600 words / β1024 tokens) and uploads each chunk (concurrently in the async client).- If the re-ranker is enabled on the server, the client can call the re-rank endpoint after receiving RAG results to re-score/re-order candidates.
Open the web UI (protected) at:
http://localhost:5500/ui
Use it to:
- Run the Setup Wizard link (if available) or inspect live configs
- Test
/create/index,/rag/create,/rag/query-rag - Access protected Swagger UI & ReDoc after logging in
- Clients (HTTP/HTTPS, Python SDK, or UI Playground) make asynchronous HTTP requests.
- FastAPI Server β orchestration and business logic; validates requests and translates them to vector store operations.
- Vector Store β Redis (HASH + HNSW/COSINE search), Qdrant (collections + vector search), or PostgreSQL with
pgvectorandpgcrypto(manual DB preparation required). - Optional Re-ranker β when enabled, a re-ranking component scores
(query, doc)pairs to improve final ordering.
- Metrics /
/status/ram: Redis offersINFO memoryandmemory_stats()β for Qdrant the same Redis-specific metrics are not available (the endpoint will return a short message explaining this). For PostgreSQL, metrics exposed differ from Redis and Qdrant; check your Postgres monitoring tooling for memory and indexing statistics. - Dtype handling: Server validates
dtypefor Redis (converts embeddings to the requested NumPy dtype). Qdrant accepts float arrays directly βdtypeis informational/compatibility metadata. For PostgreSQL+pgvector, ensure the stored vector dimension and any normalization required for cosine/inner product are handled by your ingestion pipeline. - gRPC: Qdrant can be used over HTTP or gRPC (
prefer_grpc=truein the config). Ensure your environment allows gRPC outbound/inbound as needed. - PostgreSQL note: Aquiles-RAG does not run automatic migrations for Postgres β create the
pgvectorextension, tables and indexes manually (or via your own migration tool) before using Postgres as a vector store.
See the test/ directory for automated tests:
- client tests for the Python SDK
- API tests for endpoint behavior
test_deploy.pyfor deployment / bootstrap validation
If you add Postgres to CI, prepare the DB (create
pgvectorextension and required tables/indexes) in your test fixtures since there are no automatic migrations.

