Mindoc is a privacy-focused, fully offline AI assistant that allows you to search and chat with your PDFs and PPTs locally. No cloud, no API keys โ everything runs 100% on your device using efficient Small Language Models (SLMs).
- No OpenAI
- No cloud dependencies
- No data leaves your device
- All models stored locally
- Upload multiple PDFs and PPTs
- Fully private, local processing
- Fast and accurate extraction
Powered by LaMini-Flan-T5 (248M) optimized for CPU inference.
Semantic Vector Search + Cross-Encoder Reranking
- Vector Model:
all-MiniLM-L6-v2 - Reranker:
ms-marco-MiniLM-L12-v2
- Quick Mode: Fast, short answers (Top-2 docs)
- Deep Research: Multi-doc reasoning using Map-Reduce
- Evidence-based answers
- Click on a citation โ open PDF โ auto-scroll to exact page
- Loader: PyMuPDFLoader
- Chunking: RecursiveCharacterTextSplitter
- Chunk size: 1000 chars
- Overlap: 200 chars
- Embeddings: SentenceTransformer (384-dim)
- Storage: ChromaDB (Local persistence)
- Retrieve top-10 chunks with vector search
- Re-rank with cross-encoder, keep best 3
- Feed context โ LaMini LLM โ generate answer
โโโโโโโโโโโโโโ
โ Files โ
โ PDF / PPTX โ
โโโโโโโโฌโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโ
โ Document Loaderโ
โโโโโโโโโฌโโโโโโโโโ
โChunks
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโ
โEmbeddings (Local Model)โ
โโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โVectors
โผ
โโโโโโโโโโโโโโโโโโโโ
โ Vector Store โ
โ FAISS / Chroma โ
โโโโโโโโโโโฌโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโ
โ RAG Pipeline โ
โ (Retrieve + LLM) โ
โโโโโโโโโโโฌโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโ
โ FastAPI โ
โ /query โ
โโโโโโโโโโโโโโโ
- Python 3.10+ (3.12 recommended)
- Node.js & npm
cd backend
# Virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Download models (run once)
python download_model.py
python download_reranker.py
# Start backend server
uvicorn app.main:app --reloadcd frontend
npm install
npm run dev- Drag & drop PDFs
- Supports batch uploads
- Wait for โโ Indexedโ confirmation
- Fast
- Lightweight
- Best for direct questions
- Reads many chunks
- Map-Reduce summarization
- Great for reports & summaries
- Each answer includes clickable citations
- Opens full PDF and auto-scrolls to correct page
mindoc/
โโโ backend/
โ โโโ app/
โ โ โโโ api/
โ โ โโโ rag/
โ โ โโโ services/
โ โ โโโ main.py
โ โโโ data/
โ โ โโโ chroma/
โ โ โโโ models/
โ โ โโโ uploads/
โ โโโ download_model.py
โ โโโ download_reranker.py
โ โโโ requirements.txt
โ
โโโ frontend/
โโโ src/
โ โโโ App.jsx
โ โโโ App.css
โ โโโ main.jsx
โโโ package.json
Install SQLite shim:
pip install pysqlite3-binaryLong chunks โ crash. Fixed by enabling:
truncation=TrueUsually caused by a missing reranker model. Run again:
python download_reranker.py- OCR for scanned documents
- Model switching (LaMini โ Phi-2 โ Qwen 0.5B)
- Persistent conversation history
- Voice mode (offline ASR)