An efficient, scalable, and extensible Datalog engine built atop Differential Dataflow
Paper • Quick Start • Datasets • Reproduce Results
This repo contains the implementation for the paper:
FlowLog: Efficient and Extensible Datalog via Incrementality
Hangdong Zhao, Zhenghong Yu, Srinag Rao, Simon Frisk, Zhiwei Fan, Paraschos Koutris
VLDB 2026 (Boston)
FlowLog uses a modular architecture that collectively creates a Datalog execution pipeline as follows (also see Figure 1 of the paper):
├── parsing # Parsing Datalog program
├── strata # Stratification
├── planning # Generate logical IR and optimize (per rule)
├── catalog # Generate metadata
└── optimizing # Query optimization
└── executing # Executor
├── reading # Reading data from CSV
└── macros # Rust macros for code generate each differential operator
# Automated setup (recommended):
# The env.sh script automatically handles all requirements including:
# - Rust = 1.89.0 (pinned version for reproducibility)
# - differential-dataflow = 0.16.2 (paper version for reproducibility)
# - timely = 0.23.0 (paper version for reproducibility)
# - ...
# Simply run:
bash tool/env.sh
# After installation, you may need to start a new terminal session
# or run `source ~/.bashrc` (or `source ~/.zshrc` if using zsh)
# so that environment variables and PATH updates take effect.
# Manual verification (optional):
# To check your Rust version after setup
rustc --version # Should show: rustc 1.89.0Note on Versions: For paper reproducibility, we use differential-dataflow 0.16.2 and timely 0.23.0 as reported in the VLDB paper. However, we are actively maintaining FlowLog and catching up with the most updated versions of these dependencies for improved performance.
Create a file named reach.dl with the following contents. This program computes the set of nodes reachable from the given sources:
.in
.decl Source(id: number)
.input Source.csv
.decl Arc(x: number, y: number)
.input Arc.csv
.printsize
.decl Reach(id: number)
.rule
Reach(y) :- Source(y).
Reach(y) :- Reach(x), Arc(x, y).
Create a directory called reach and place the EDB files inside. For this example, you can use livejournal:
mkdir -p reach
cd reach
curl -LO https://pages.cs.wisc.edu/~m0riarty/dataset/csv/livejournal.zip
unzip livejournal.zip
mv livejournal/* ./
rmdir livejournal
cd ..cargo build --release
target/release/executing -p reach.dl -f reach -w 64# Release build
cargo build --release # Batch mode (Present, default)
cargo build --release --features isize-type --no-default-features # Incremental mode (isize)FlowLog currently supports two execution modes for Datalog applications:
- Batch Mode (default): Uses
differential_dataflow::difference::Presentfor static Datalog semantics. This mode only tracks whether facts are present or absent, making it suitable for high-performance static Datalog execution. - Incremental Mode: Uses
isizeas thedifftype for DD's incremental semantics. This allows tracking how many times each fact is derived, supporting incremental view maintenance for Datalog programs.
| Execution Mode | Build Command | Use Case |
|---|---|---|
| Batch Mode (default) | cargo build --release |
Static Datalog execution (used in the paper benchmarking) |
| Incremental Mode | cargo build --release --features isize-type --no-default-features |
Incremental Datalog execution |
After (release) build, use the executing binary to run Datalog programs:
# Basic usage
target/release/executing -p <program.dl> -f <facts_directory> -w <number_threads>
# Example with concrete paths
target/release/executing -p examples/reach.dl -f reach -w 8| Option | Description |
|---|---|
-p, --program <FILE> |
Path to the Datalog program file (.dl extension) |
-f, --facts <DIR> |
Directory containing input fact files (EDBs) |
-c, --csvs <DIR> |
Optional: Directory for emitting output results (IDBs). If not set, only print IDB sizes in terminal. |
-d, --delimiter <CHAR> |
Delimiter for input files (default: ,) |
-w, --workers <NUM> |
Number of worker threads (default: 1) |
-O <LEVEL> |
Optimization level (0-3): 0 - No optimization 1 - Sideways Information Passing (SIP) 2 - Structural Planning 3 - Both optimizations (SIP + Planning) |
# Basic execution under default settings
target/release/executing -p examples/reach.dl -f reach
# Multi-threaded (16 threads) execution, flushing IDBs to output/
target/release/executing -p examples/tc.dl -f tc -c output -w 16
# Robust execution using both SIP and Planning
target/release/executing -p examples/batik.dl -f batik -d $'\t' -w 32 -O 3
# Debug print RUST_LOG=debug
RUST_LOG=debug target/release/executing -p examples/batik.dl -f batik -c results -O 2All datasets used in the paper evaluation are publicly available:
Paper Datasets: https://pages.cs.wisc.edu/~m0riarty/dataset/csv/
FlowLog supports standard Datalog with common extensions:
// Simple graph reach
reach(x) :- source(x).
reach(y) :- reach(x), edge(x, y).
// constraints
two_hops(x, z) :- edge(x, y), edge(y, z), x != z.
// negation
indirect_only(x, z) :- edge(x, y), edge(y, z), !edge(x, z).
// aggregation
count_paths(x, z, count(y)) :- edge(x, y), edge(y, z).
max_salary(dept, max(salary)) :- employee(emp_id, salary), works_in(emp_id, dept).
Aggregation
FlowLog currently supports count, sum, min, max aggregation operators. However, the aggregate field must be the last argument in the head IDB. All rules deriving the same IDB must conform to the same aggregation type (e.g. count, sum).
Compilation
FlowLog currently compiles very slowly due to heavy dependencies (e.g., DD/Timely). On r6525 node, a from-scratch release build can take ~16 minutes.
Arithmetic Head
Support for the Arithmetic Head feature is currently unstable and conflicts with the existing SIP optimization. We have therefore moved it to a temporary branch nemo_arithmetic. You can check out this branch to run programs that require this feature (e.g., SSSP). We have confirmed it runs correctly on SSSP, but we do not guarantee correctness in general.
The examples/ directory contains several sample Datalog programs demonstrating various features and use cases.
This repository includes FlowLog-Reproduction as a git submodule. You can use this submodule to reproduce the experiment figures from the paper. Please initialize submodules after cloning:
git submodule update --init --recursiveContributions are welcome! Feel free to submit a pull request or open an issue.
