🐘 MiniPostgres Documentation

A deep-dive into every component of MiniPostgres — a Java 17+ relational database engine built from scratch.

Architecture Overview

SQL Query ("SELECT * FROM students WHERE id = 1")
         │
         ▼
   ┌─────────────┐
   │   REPL.java  │  ← Parses SQL text into commands
   └──────┬──────┘
          ▼
   ┌──────────────────────┐
   │  SimpleQueryEngine   │  ← Picks the fastest execution strategy
   └──────┬───────┬───────┘
          │       │
     ┌────▼──┐ ┌──▼───────┐
     │SeqScan│ │IndexScan │  ← O(n) brute force vs O(log n) B+ Tree
     └───┬───┘ └──┬───┬───┘
         │        │   │
         ▼        │   ▼
   ┌──────────┐   │ ┌────────┐
   │ HeapFile  │◀──┘ │ BTree  │  ← Table storage + Index
   └─────┬────┘      └───┬───┘
         │                │
         ▼                ▼
   ┌────────────┐    (in-memory)
   │ BufferPool │  ← Page cache with LRU eviction
   └─────┬──────┘
         ▼
   ┌─────────────┐
   │ DiskManager │  ← Raw file I/O
   └─────┬───────┘
         ▼
   ┌──────────┐
   │ .dat file│  ← 4KB pages on disk
   └──────────┘

Documentation Index

Layer 1: Storage Engine

Doc	File	Purpose
Schema	`Schema.java`	Table structure definition (columns, types)
Tuple	`Tuple.java`	Row serialization/deserialization
Page	`Page.java`	4KB slotted page architecture
HeapFile	`HeapFile.java`	Multi-page table storage

Layer 2: Buffer Pool & Disk

Doc	File	Purpose
DiskManager	`DiskManager.java`	Raw page I/O to disk
BufferPool	`BufferPool.java`	In-memory page cache with LRU

Layer 3: B+ Tree Index

Doc	File	Purpose
RecordId	`RecordId.java`	Pointer to a tuple on disk
BTreeNode	`BTreeNode.java`	Internal and leaf node structure
BTree	`BTree.java`	Full B+ Tree with insert/search/range/delete

Layer 4: Catalog

Doc	File	Purpose
Catalog	`Catalog.java`	Table/schema/index registry
IndexManager	`IndexManager.java`	Index lifecycle management

Layer 5: Query Execution

Doc	File	Purpose
Predicate	`Predicate.java`	Query filter conditions
SeqScan	`SeqScan.java`	Full table scan O(n)
IndexScan	`IndexScan.java`	B+ Tree lookup O(log n)
SimpleQueryEngine	`SimpleQueryEngine.java`	Strategy routing

Layer 6: CLI

Doc	File	Purpose
REPL	`REPL.java`	SQL command-line interface
Main	`Main.java`	Entry point and bootstrap

Key Design Decisions

Why Slotted Pages?

PostgreSQL uses slotted pages internally. Tuples grow backward from the end of the page while the slot directory grows forward from the header. This allows efficient deletes (just mark a slot as deleted) without physically shifting tuple data.

Why B+ Tree (not B-Tree)?

In a classic B-tree, data can live in internal AND leaf nodes. But when you want range scans (e.g., WHERE age BETWEEN 20 AND 30), you'd have to do a complex in-order traversal. B+ trees keep ALL data in leaf nodes and link leaves together — so range scans simply walk the leaf chain. PostgreSQL uses B+ trees for exactly this reason.

Why LRU Buffer Pool?

Disks are ~100,000× slower than RAM. By caching recently-accessed pages in memory and only flushing to disk when necessary, we avoid redundant I/O. LRU (Least Recently Used) evicts the page that hasn't been accessed for the longest time — a simple but effective policy used by real databases.

Why Strategy-Based Query Execution?

Not every query benefits from an index. A SELECT * returns all rows anyway — an index adds overhead. The query engine checks: "is there an index on the column in the WHERE clause?" If yes, use IndexScan. If no, use SeqScan. This is a simplified version of what PostgreSQL's query planner does.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
src		src
target		target
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐘 MiniPostgres Documentation

Architecture Overview

Documentation Index

Layer 1: Storage Engine

Layer 2: Buffer Pool & Disk

Layer 3: B+ Tree Index

Layer 4: Catalog

Layer 5: Query Execution

Layer 6: CLI

Key Design Decisions

Why Slotted Pages?

Why B+ Tree (not B-Tree)?

Why LRU Buffer Pool?

Why Strategy-Based Query Execution?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🐘 MiniPostgres Documentation

Architecture Overview

Documentation Index

Layer 1: Storage Engine

Layer 2: Buffer Pool & Disk

Layer 3: B+ Tree Index

Layer 4: Catalog

Layer 5: Query Execution

Layer 6: CLI

Key Design Decisions

Why Slotted Pages?

Why B+ Tree (not B-Tree)?

Why LRU Buffer Pool?

Why Strategy-Based Query Execution?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages