Zero-cost serialization for GHC Haskell. Freeze any closure graph — constructors, thunks, functions, partial applications — into a snapshot and thaw it back in microseconds.
main = do
let config = frozen ".cache/config" loadConfig
index = frozen ".cache/index" buildIndex
grammar = frozen ".cache/grammar" parseGrammar
runApp config index grammarFirst run: evaluates each expression, walks the closure graph, serializes to disk.
Every subsequent run: mmap + mremap the data directly into the GHC heap. No parsing, no deserialization, no allocation. The closures are just there.
1M-entry Vector Text benchmark (91MB closure data):
| Approach | Startup time | Speedup |
|---|---|---|
| Build from scratch | 618ms | 1x |
| Thaw from file (mmap) | 2.6ms | 238x |
| Thaw from ELF section (mremap) | 2.6ms | 238x |
Thaw breakdown (ELF section path):
| Phase | Time |
|---|---|
| mmap anonymous (bdescr region) | 4 µs |
| mremap 91MB (page table ops) | 8 µs |
| Relocation fixup | 0 µs |
| bdescr init (91 megablocks) | 884 µs |
| Thaw total | ~900 µs |
| RTS overhead (hs_init + hs_exit) | ~1600 µs |
Data pages are demand-paged: only pages you actually access incur a fault (~40ns each). A CLI tool that touches 100 entries out of 1M pays for 100 entries, not 1M.
| Compact Regions | ghc-fastboot | |
|---|---|---|
| Thunks | No (NF only) | Yes |
| Functions / PAPs | No | Yes |
| Closures over free vars | No | Yes |
| Cross-binary migration | No | Yes (symbol table + dlsym) |
| Layout | Compacted, contiguous | Compacted, contiguous |
| GC integration | Built-in | Manual bdescr + StablePtr |
| Requires GHC patch | No | No |
The closure walker handles all GHC closure types via info table traversal — the same mechanism the GC uses.
- Walk the closure graph from the root using info table layouts
- Copy each closure back-to-back into a contiguous buffer (compacting)
- Rewrite internal pointers as absolute addresses at a fixed VA
- Record relocation entries for code/static pointers with symbol names
- Write snapshot: header + relocation table + symbol table + page-aligned closure data
Three paths, tried in order:
-
ELF section (
thaw_from_section): snapshot embedded in binary viaobjcopy --update-section. ELF loader maps it at startup.mremap(MREMAP_FIXED)moves page table entries to the target VA — zero page faults, zero I/O. -
EmbedFooter (
thaw_from_fdvia/proc/self/exe): snapshot appended to binary with a footer.mmap MAP_PRIVATEfrom the executable file — CoW, lazy page-in. -
Standalone file (
thaw_from_fd):mmap MAP_PRIVATEfrom a.snapfile.
All paths:
- Map at fixed VA inside GHC's 1TB
mblock_address_space(900GB offset) - Zero internal relocations (same binary + fixed VA → all pointers already correct)
- Initialize bdescrs for GC integration (
BF_LARGE | BF_PINNEDinoldest_gen) - Create a
StablePtras GC root
┌─────────────────────────────────┐ offset 0
│ SnapshotHeaderV2 (128 bytes) │
├─────────────────────────────────┤ reloc_table_offset
│ RelocationEntry[] (8 bytes ea) │
├─────────────────────────────────┤ symbol_table_offset
│ SymbolEntry[] (variable) │
├─────────────────────────────────┤ fingerprint_table_offset
│ FingerprintEntry[] (40 bytes ea)│
├─────────────────────────────────┤ closure_data_offset (PAGE-ALIGNED)
│ Closure data (packed closures) │
│ root closure at offset 0 │
│ all pointers absolute │
└─────────────────────────────────┘
Key fields:
target_va: fixed virtual address for zero-relocation thawsnap_text_base:.textbase at freeze time (for code pointer delta)binary_hash: SHA-256 of.text+.rodata(same hash → skip symbol resolution)
| Scenario | Internal ptrs | Code ptrs | Cost |
|---|---|---|---|
| Same binary, fixed VA | skip | skip | 0 |
| Same binary, different VA | += va_delta |
skip | O(n) |
| Different binary | += va_delta |
dlsym per symbol |
O(n + symbols) |
# Default build (library + benchmarks)
nix build
# Embedded binary (three-phase: build → snapshot → objcopy)
nix build .#bench-embedded
# Development
nix develop
cabal buildThe embedded pipeline:
nix build— compileghc-fastbootwith BFD linker, placeholderfastbootELF sectionsnapshotderivation — run binary once to freeze, cachedbench-embeddedderivation —objcopy --update-section fastboot=snapshot.snap
The linker script (cbits/fastboot.ld) places the fastboot section after .bss in its own PT_LOAD segment, ensuring objcopy expansion doesn't overlap .bss.
ghc-fastboot/
├── lib/FastBoot.hs # Haskell API: frozen, unsafeFrozen
├── cbits/
│ ├── fastboot.h # Snapshot format structs, constants
│ ├── freeze.c # Closure graph walk + serialize
│ ├── thaw.c # mmap/mremap restore + GC setup
│ ├── closure_walk.h # Info table-driven closure walker
│ ├── relocate.c # ASLR relocation (v1)
│ ├── embed.c # Binary embedding
│ ├── embed_section.S # ELF section placeholder
│ └── fastboot.ld # Linker script (section after .bss)
├── bench/
│ ├── mini-fzf-frozen/ # 1M-entry Vector Text benchmark
│ ├── mini-fzf/ # Unfrozen baseline
│ └── minimal/ # RTS overhead baseline
├── docs/
│ ├── design.md # This document
│ └── plans/
│ └── snapshot-format-v2-spec.md
└── flake.nix # Nix build with three-phase pipeline
-- Freeze/thaw without binary compatibility checks.
-- First run: evaluates action, serializes result to path.
-- Subsequent runs: thaws from snapshot (or ELF section if embedded).
frozen :: FilePath -> IO a -> a
-- Same as frozen (no binary hash verification yet).
unsafeFrozen :: FilePath -> IO a -> afrozen uses unsafePerformIO. The snapshot path is the fallback — if the binary has an embedded snapshot (ELF section), it's used first with zero file I/O.
- Design Document — motivation, prior art (V8, Emacs pdump, GraalVM, CRaC, BEAM), GHC runtime internals, why process-level snapshots were abandoned, measurement pitfalls, and future directions
- Snapshot Format v2 Spec — binary format specification with relocation strategies and Merkle fingerprint design
- GHC Compact Regions and Runtime Limitations — analysis of GHC's closed storage manager and why a public external heap API is needed
A daemon managing frozen images in shared memory. Multiple Haskell runtimes mmap the same snapshots at agreed-upon fixed VAs. Physical pages shared across processes via the kernel page cache. Linear Haskell types for safe lifecycle management.
Frozen images can reference other frozen images via cross-image pointers. Each image gets a VA slot in the 1TB range. Like dynamic linking but for data structures — a data linker/loader.
The packed snapshot format is network-ready. Send closure graphs over RDMA, thaw on remote nodes. With the same binary on all nodes, zero relocation. Enables work stealing with native GHC closures instead of serialized bytes.
Multiple runtimes share a frozen image (MAP_PRIVATE, CoW). Each evaluates different thunks. Results shared via IND closures pointing into shared regions. Fixed VA makes cross-runtime pointers work without coordination.
- Merkle fingerprint computation
- Binary hash verification
fastboot-migrateCLI tool- Lazy bdescr init (userfaultfd)
withFreezescoped API