Skip to content

jhhuh/ghc-fastboot

Repository files navigation

ghc-fastboot

Zero-cost serialization for GHC Haskell. Freeze any closure graph — constructors, thunks, functions, partial applications — into a snapshot and thaw it back in microseconds.

What it does

main = do
    let config  = frozen ".cache/config"  loadConfig
        index   = frozen ".cache/index"   buildIndex
        grammar = frozen ".cache/grammar" parseGrammar
    runApp config index grammar

First run: evaluates each expression, walks the closure graph, serializes to disk. Every subsequent run: mmap + mremap the data directly into the GHC heap. No parsing, no deserialization, no allocation. The closures are just there.

Performance

1M-entry Vector Text benchmark (91MB closure data):

Approach Startup time Speedup
Build from scratch 618ms 1x
Thaw from file (mmap) 2.6ms 238x
Thaw from ELF section (mremap) 2.6ms 238x

Thaw breakdown (ELF section path):

Phase Time
mmap anonymous (bdescr region) 4 µs
mremap 91MB (page table ops) 8 µs
Relocation fixup 0 µs
bdescr init (91 megablocks) 884 µs
Thaw total ~900 µs
RTS overhead (hs_init + hs_exit) ~1600 µs

Data pages are demand-paged: only pages you actually access incur a fault (~40ns each). A CLI tool that touches 100 entries out of 1M pays for 100 entries, not 1M.

How it differs from GHC Compact Regions

Compact Regions ghc-fastboot
Thunks No (NF only) Yes
Functions / PAPs No Yes
Closures over free vars No Yes
Cross-binary migration No Yes (symbol table + dlsym)
Layout Compacted, contiguous Compacted, contiguous
GC integration Built-in Manual bdescr + StablePtr
Requires GHC patch No No

The closure walker handles all GHC closure types via info table traversal — the same mechanism the GC uses.

Architecture

Freeze (once, at build time or first run)

  1. Walk the closure graph from the root using info table layouts
  2. Copy each closure back-to-back into a contiguous buffer (compacting)
  3. Rewrite internal pointers as absolute addresses at a fixed VA
  4. Record relocation entries for code/static pointers with symbol names
  5. Write snapshot: header + relocation table + symbol table + page-aligned closure data

Thaw (every run, ~16µs for mmap+mremap)

Three paths, tried in order:

  1. ELF section (thaw_from_section): snapshot embedded in binary via objcopy --update-section. ELF loader maps it at startup. mremap(MREMAP_FIXED) moves page table entries to the target VA — zero page faults, zero I/O.

  2. EmbedFooter (thaw_from_fd via /proc/self/exe): snapshot appended to binary with a footer. mmap MAP_PRIVATE from the executable file — CoW, lazy page-in.

  3. Standalone file (thaw_from_fd): mmap MAP_PRIVATE from a .snap file.

All paths:

  • Map at fixed VA inside GHC's 1TB mblock_address_space (900GB offset)
  • Zero internal relocations (same binary + fixed VA → all pointers already correct)
  • Initialize bdescrs for GC integration (BF_LARGE | BF_PINNED in oldest_gen)
  • Create a StablePtr as GC root

Snapshot Format (v2)

┌─────────────────────────────────┐  offset 0
│ SnapshotHeaderV2 (128 bytes)    │
├─────────────────────────────────┤  reloc_table_offset
│ RelocationEntry[] (8 bytes ea)  │
├─────────────────────────────────┤  symbol_table_offset
│ SymbolEntry[] (variable)        │
├─────────────────────────────────┤  fingerprint_table_offset
│ FingerprintEntry[] (40 bytes ea)│
├─────────────────────────────────┤  closure_data_offset (PAGE-ALIGNED)
│ Closure data (packed closures)  │
│  root closure at offset 0       │
│  all pointers absolute          │
└─────────────────────────────────┘

Key fields:

  • target_va: fixed virtual address for zero-relocation thaw
  • snap_text_base: .text base at freeze time (for code pointer delta)
  • binary_hash: SHA-256 of .text + .rodata (same hash → skip symbol resolution)

Relocation Strategy

Scenario Internal ptrs Code ptrs Cost
Same binary, fixed VA skip skip 0
Same binary, different VA += va_delta skip O(n)
Different binary += va_delta dlsym per symbol O(n + symbols)

Building

# Default build (library + benchmarks)
nix build

# Embedded binary (three-phase: build → snapshot → objcopy)
nix build .#bench-embedded

# Development
nix develop
cabal build

The embedded pipeline:

  1. nix build — compile ghc-fastboot with BFD linker, placeholder fastboot ELF section
  2. snapshot derivation — run binary once to freeze, cached
  3. bench-embedded derivation — objcopy --update-section fastboot=snapshot.snap

The linker script (cbits/fastboot.ld) places the fastboot section after .bss in its own PT_LOAD segment, ensuring objcopy expansion doesn't overlap .bss.

Project Structure

ghc-fastboot/
├── lib/FastBoot.hs              # Haskell API: frozen, unsafeFrozen
├── cbits/
│   ├── fastboot.h               # Snapshot format structs, constants
│   ├── freeze.c                 # Closure graph walk + serialize
│   ├── thaw.c                   # mmap/mremap restore + GC setup
│   ├── closure_walk.h           # Info table-driven closure walker
│   ├── relocate.c               # ASLR relocation (v1)
│   ├── embed.c                  # Binary embedding
│   ├── embed_section.S          # ELF section placeholder
│   └── fastboot.ld              # Linker script (section after .bss)
├── bench/
│   ├── mini-fzf-frozen/         # 1M-entry Vector Text benchmark
│   ├── mini-fzf/                # Unfrozen baseline
│   └── minimal/                 # RTS overhead baseline
├── docs/
│   ├── design.md                # This document
│   └── plans/
│       └── snapshot-format-v2-spec.md
└── flake.nix                    # Nix build with three-phase pipeline

Haskell API

-- Freeze/thaw without binary compatibility checks.
-- First run: evaluates action, serializes result to path.
-- Subsequent runs: thaws from snapshot (or ELF section if embedded).
frozen :: FilePath -> IO a -> a

-- Same as frozen (no binary hash verification yet).
unsafeFrozen :: FilePath -> IO a -> a

frozen uses unsafePerformIO. The snapshot path is the fallback — if the binary has an embedded snapshot (ELF section), it's used first with zero file I/O.

Documentation

  • Design Document — motivation, prior art (V8, Emacs pdump, GraalVM, CRaC, BEAM), GHC runtime internals, why process-level snapshots were abandoned, measurement pitfalls, and future directions
  • Snapshot Format v2 Spec — binary format specification with relocation strategies and Merkle fingerprint design
  • GHC Compact Regions and Runtime Limitations — analysis of GHC's closed storage manager and why a public external heap API is needed

Future Directions

Shared Memory Daemon

A daemon managing frozen images in shared memory. Multiple Haskell runtimes mmap the same snapshots at agreed-upon fixed VAs. Physical pages shared across processes via the kernel page cache. Linear Haskell types for safe lifecycle management.

Composable Images

Frozen images can reference other frozen images via cross-image pointers. Each image gets a VA slot in the 1TB range. Like dynamic linking but for data structures — a data linker/loader.

Distributed HPC

The packed snapshot format is network-ready. Send closure graphs over RDMA, thaw on remote nodes. With the same binary on all nodes, zero relocation. Enables work stealing with native GHC closures instead of serialized bytes.

Cross-Runtime Thunk Evaluation

Multiple runtimes share a frozen image (MAP_PRIVATE, CoW). Each evaluates different thunks. Results shared via IND closures pointing into shared regions. Fixed VA makes cross-runtime pointers work without coordination.

Not Yet Implemented

  • Merkle fingerprint computation
  • Binary hash verification
  • fastboot-migrate CLI tool
  • Lazy bdescr init (userfaultfd)
  • withFreeze scoped API

About

Zero-cost serialization for GHC Haskell — freeze any closure graph into a snapshot, thaw it back in microseconds via mmap/mremap

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors