Skip to content

kishansinghifs1/DB-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

🐘 MiniPostgres Documentation

A deep-dive into every component of MiniPostgres — a Java 17+ relational database engine built from scratch.

Architecture Overview

SQL Query ("SELECT * FROM students WHERE id = 1")
         │
         ▼
   ┌─────────────┐
   │   REPL.java  │  ← Parses SQL text into commands
   └──────┬──────┘
          ▼
   ┌──────────────────────┐
   │  SimpleQueryEngine   │  ← Picks the fastest execution strategy
   └──────┬───────┬───────┘
          │       │
     ┌────▼──┐ ┌──▼───────┐
     │SeqScan│ │IndexScan │  ← O(n) brute force vs O(log n) B+ Tree
     └───┬───┘ └──┬───┬───┘
         │        │   │
         ▼        │   ▼
   ┌──────────┐   │ ┌────────┐
   │ HeapFile  │◀──┘ │ BTree  │  ← Table storage + Index
   └─────┬────┘      └───┬───┘
         │                │
         ▼                ▼
   ┌────────────┐    (in-memory)
   │ BufferPool │  ← Page cache with LRU eviction
   └─────┬──────┘
         ▼
   ┌─────────────┐
   │ DiskManager │  ← Raw file I/O
   └─────┬───────┘
         ▼
   ┌──────────┐
   │ .dat file│  ← 4KB pages on disk
   └──────────┘

Documentation Index

Layer 1: Storage Engine

Doc File Purpose
Schema Schema.java Table structure definition (columns, types)
Tuple Tuple.java Row serialization/deserialization
Page Page.java 4KB slotted page architecture
HeapFile HeapFile.java Multi-page table storage

Layer 2: Buffer Pool & Disk

Doc File Purpose
DiskManager DiskManager.java Raw page I/O to disk
BufferPool BufferPool.java In-memory page cache with LRU

Layer 3: B+ Tree Index

Doc File Purpose
RecordId RecordId.java Pointer to a tuple on disk
BTreeNode BTreeNode.java Internal and leaf node structure
BTree BTree.java Full B+ Tree with insert/search/range/delete

Layer 4: Catalog

Doc File Purpose
Catalog Catalog.java Table/schema/index registry
IndexManager IndexManager.java Index lifecycle management

Layer 5: Query Execution

Doc File Purpose
Predicate Predicate.java Query filter conditions
SeqScan SeqScan.java Full table scan O(n)
IndexScan IndexScan.java B+ Tree lookup O(log n)
SimpleQueryEngine SimpleQueryEngine.java Strategy routing

Layer 6: CLI

Doc File Purpose
REPL REPL.java SQL command-line interface
Main Main.java Entry point and bootstrap

Key Design Decisions

Why Slotted Pages?

PostgreSQL uses slotted pages internally. Tuples grow backward from the end of the page while the slot directory grows forward from the header. This allows efficient deletes (just mark a slot as deleted) without physically shifting tuple data.

Why B+ Tree (not B-Tree)?

In a classic B-tree, data can live in internal AND leaf nodes. But when you want range scans (e.g., WHERE age BETWEEN 20 AND 30), you'd have to do a complex in-order traversal. B+ trees keep ALL data in leaf nodes and link leaves together — so range scans simply walk the leaf chain. PostgreSQL uses B+ trees for exactly this reason.

Why LRU Buffer Pool?

Disks are ~100,000× slower than RAM. By caching recently-accessed pages in memory and only flushing to disk when necessary, we avoid redundant I/O. LRU (Least Recently Used) evicts the page that hasn't been accessed for the longest time — a simple but effective policy used by real databases.

Why Strategy-Based Query Execution?

Not every query benefits from an index. A SELECT * returns all rows anyway — an index adds overhead. The query engine checks: "is there an index on the column in the WHERE clause?" If yes, use IndexScan. If no, use SeqScan. This is a simplified version of what PostgreSQL's query planner does.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages