Skip to content

High-performance ACID relational database in C++ with MVCC, B+ tree indexes, and SQL query optimization

License

Notifications You must be signed in to change notification settings

ShreeChaturvedi/entropy

Repository files navigation

Entropy Database Engine Entropy Database Engine


ci release license c++

Entropy is a high-performance relational database engine built from scratch in modern C++20. It showcases core database internals: B+ tree storage, MVCC, ACID transactions, write-ahead logging, and a cost-based query optimizer.

Highlights

  • End-to-end SQL engine: parser, binder, optimizer, execution.
  • Storage engine with B+ trees, hash indexes, and buffer pool caching.
  • MVCC + WAL for snapshot isolation and crash recovery.
  • Unit and integration tests wired to CTest + GoogleTest.
  • Benchmarks with optional SQLite comparison and reproducible scripts.
  • Cross-platform CMake build with Linux/macOS/Windows CI.

Architecture

Entropy architecture diagram Entropy architecture diagram

Architecture Details

Entropy is organized as focused C++ libraries that mirror the logical database pipeline. The system is split into layers with explicit boundaries.

SQL Frontend

  • Lexer and parser build an AST for SELECT, INSERT, UPDATE, DELETE, and CREATE TABLE.
  • Binder resolves names and types using catalog metadata and column indexes.

Planning and Optimization

  • Statistics track row counts and simple selectivity estimates.
  • Cost model compares index scans and sequential scans for predicates.
  • Index selector chooses point lookup and range scan paths when an index exists.

Execution

  • Iterator operators for seq scan, index scan, hash join, nested loop join, sort, aggregate, filter, and limit.
  • Projection and DML executors materialize rows and apply insert, update, and delete.

Transactions and Recovery

  • MVCC version chains with timestamp visibility checks.
  • Lock manager enforces isolation with deadlock detection.
  • WAL records are flushed on commit and replayed on recovery.

Storage and Buffering

  • Slotted pages back a table heap for variable length tuples.
  • B+ tree and extendible hash indexes support range and point access.
  • Buffer pool uses an LRU replacer with pin and dirty tracking.
  • Disk manager handles page and WAL IO.

Performance Snapshot (vs SQLite)

Run file: docs/benchmarks/runs/bench-20251226-214051.json

  • Machine: Apple M2 (arm64), macOS 15.5
  • Compiler: Apple clang 17.0.0 (clang-1700.0.13.5)
  • Build: Release (-O3) SQLite baselines are collected when ENTROPY_BENCH_COMPARE_SQLITE=ON.

Per-iteration ns/op (ratio = Entropy / SQLite, lower is better):

Case Rows Entropy (ns/op) SQLite (ns/op) Ratio
Insert batch (txn) 1k 940,970 1,048,274 0.90x
Insert batch (txn) 10k 9,693,951 6,992,261 1.39x
Point select 1k 46,041 23,020 2.00x
Point select 10k 459,723 179,783 2.56x

Rows are the batch size for inserts and the table cardinality for point selects.

Benchmark comparison chart Benchmark comparison chart

Chart units are microseconds per op. The insert and point select panels use different y axis ranges.

Full results: docs/benchmarks/bench_summary.csv

Benchmarks

./scripts/bench/run.sh

Windows:

powershell -ExecutionPolicy Bypass -File scripts/bench/run.ps1

The script writes a timestamped JSON run file in docs/benchmarks/runs/ and updates docs/benchmarks/bench_summary.csv.

SQLite comparison is ON by default for the script. If SQLite3 headers are missing, the script fails with an explicit error. To run without SQLite, set ENTROPY_BENCH_COMPARE_SQLITE=OFF. Manual steps and methodology are in docs/benchmarks.md.

Quick Start

#include <entropy/entropy.hpp>
#include <iostream>

int main() {
    entropy::Database db("mydb.entropy");
    db.execute("CREATE TABLE users (id INT, name VARCHAR(100))");
    db.execute("INSERT INTO users VALUES (1, 'Alice')");

    auto result = db.execute("SELECT * FROM users");
    for (const auto &row : result) {
        std::cout << row["name"].as_string() << "\n";
    }
}

Build and Test

Prerequisites

  • C++20 compiler (GCC 10+, Clang 12+, MSVC 19.29+)
  • CMake 3.20+
  • Git

Configure + Build

Dependencies (spdlog, GoogleTest, Google Benchmark) are fetched with CMake FetchContent on first configure (requires network access). Optional comparisons use system SQLite3, while compression uses LZ4 when enabled.

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release

Run Tests

ctest --test-dir build --output-on-failure -C Release

CMake Presets

cmake --preset dev
cmake --build --preset dev
ctest --preset dev

Build Options

Option Default Description
ENTROPY_BUILD_TESTS Build unit + integration tests
ENTROPY_BUILD_BENCHMARKS Build benchmarks
ENTROPY_BENCH_COMPARE_SQLITE Build SQLite comparison benchmarks
ENTROPY_BUILD_EXAMPLES Build example programs
ENTROPY_ENABLE_LZ4 Enable page compression (LZ4)

LZ4 compression tests are only built when ENTROPY_ENABLE_LZ4=ON.

CI/CD and Releases

  • CI builds and tests on Linux/macOS/Windows via GitHub Actions.
  • Releases are created automatically on v* tags with OS-specific binaries.

Documentation

  • DESIGN.md for architecture notes and component details
  • docs/benchmarks.md for benchmark methodology and reporting

License

MIT. See LICENSE.

About

High-performance ACID relational database in C++ with MVCC, B+ tree indexes, and SQL query optimization

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages