GitHub

An efficient, scalable, and extensible Datalog engine built atop Differential Dataflow

Paper • Quick Start • Datasets • Reproduce Results

FlowLog Paper

This repo contains the implementation for the paper:

FlowLog: Efficient and Extensible Datalog via Incrementality
Hangdong Zhao, Zhenghong Yu, Srinag Rao, Simon Frisk, Zhiwei Fan, Paraschos Koutris
VLDB 2026 (Boston)

Read the paper on arXiv

FlowLog Architecture

FlowLog uses a modular architecture that collectively creates a Datalog execution pipeline as follows (also see Figure 1 of the paper):

├── parsing       # Parsing Datalog program
├── strata        # Stratification
├── planning      # Generate logical IR and optimize (per rule)
  ├── catalog       # Generate metadata 
  └── optimizing    # Query optimization 
└── executing     # Executor
  ├── reading       # Reading data from CSV
  └── macros        # Rust macros for code generate each differential operator

Quick Example

Environment Setup

# Automated setup (recommended):
# The env.sh script automatically handles all requirements including:
# - Rust = 1.89.0 (pinned version for reproducibility)
# - differential-dataflow = 0.16.2 (paper version for reproducibility)
# - timely = 0.23.0 (paper version for reproducibility)
# - ...

# Simply run:
bash tool/env.sh

# After installation, you may need to start a new terminal session
# or run `source ~/.bashrc` (or `source ~/.zshrc` if using zsh)
# so that environment variables and PATH updates take effect.

# Manual verification (optional):
# To check your Rust version after setup
rustc --version  # Should show: rustc 1.89.0

Note on Versions: For paper reproducibility, we use differential-dataflow 0.16.2 and timely 0.23.0 as reported in the VLDB paper. However, we are actively maintaining FlowLog and catching up with the most updated versions of these dependencies for improved performance.

Write a Simple Program

Create a file named reach.dl with the following contents. This program computes the set of nodes reachable from the given sources:

.in
.decl Source(id: number)
.input Source.csv

.decl Arc(x: number, y: number)
.input Arc.csv

.printsize
.decl Reach(id: number)

.rule
Reach(y) :- Source(y).
Reach(y) :- Reach(x), Arc(x, y).

Prepare Input Data

Create a directory called reach and place the EDB files inside. For this example, you can use livejournal:

mkdir -p reach
cd reach
curl -LO https://pages.cs.wisc.edu/~m0riarty/dataset/csv/livejournal.zip
unzip livejournal.zip
mv livejournal/* ./
rmdir livejournal
cd ..

Build and Run

cargo build --release
target/release/executing -p reach.dl -f reach -w 64

FlowLog Build

# Release build
cargo build --release                                             # Batch mode (Present, default)
cargo build --release --features isize-type --no-default-features # Incremental mode (isize)

Execution Modes

FlowLog currently supports two execution modes for Datalog applications:

Batch Mode (default): Uses differential_dataflow::difference::Present for static Datalog semantics. This mode only tracks whether facts are present or absent, making it suitable for high-performance static Datalog execution.
Incremental Mode: Uses isize as the diff type for DD's incremental semantics. This allows tracking how many times each fact is derived, supporting incremental view maintenance for Datalog programs.

Build Options

Execution Mode	Build Command	Use Case
Batch Mode (default)	`cargo build --release`	Static Datalog execution (used in the paper benchmarking)
Incremental Mode	`cargo build --release --features isize-type --no-default-features`	Incremental Datalog execution

FlowLog Run

After (release) build, use the executing binary to run Datalog programs:

# Basic usage
target/release/executing -p <program.dl> -f <facts_directory> -w <number_threads>

# Example with concrete paths
target/release/executing -p examples/reach.dl -f reach -w 8

Command Options

Option	Description
`-p, --program <FILE>`	Path to the Datalog program file (`.dl` extension)
`-f, --facts <DIR>`	Directory containing input fact files (EDBs)
`-c, --csvs <DIR>`	Optional: Directory for emitting output results (IDBs). If not set, only print IDB sizes in terminal.
`-d, --delimiter <CHAR>`	Delimiter for input files (default: `,`)
`-w, --workers <NUM>`	Number of worker threads (default: 1)
`-O <LEVEL>`	Optimization level (0-3): `0` - No optimization `1` - Sideways Information Passing (SIP) `2` - Structural Planning `3` - Both optimizations (SIP + Planning)

Example Commands

# Basic execution under default settings
target/release/executing -p examples/reach.dl -f reach

# Multi-threaded (16 threads) execution, flushing IDBs to output/
target/release/executing -p examples/tc.dl -f tc -c output -w 16

# Robust execution using both SIP and Planning
target/release/executing -p examples/batik.dl -f batik -d $'\t' -w 32 -O 3

# Debug print RUST_LOG=debug
RUST_LOG=debug target/release/executing -p examples/batik.dl -f batik -c results -O 2

Datasets

All datasets used in the paper evaluation are publicly available:

Paper Datasets: https://pages.cs.wisc.edu/~m0riarty/dataset/csv/

FlowLog (Datalog) Syntax

FlowLog supports standard Datalog with common extensions:

// Simple graph reach
reach(x) :- source(x).
reach(y) :- reach(x), edge(x, y).

// constraints
two_hops(x, z) :- edge(x, y), edge(y, z), x != z.

// negation
indirect_only(x, z) :- edge(x, y), edge(y, z), !edge(x, z).

// aggregation
count_paths(x, z, count(y)) :- edge(x, y), edge(y, z).
max_salary(dept, max(salary)) :- employee(emp_id, salary), works_in(emp_id, dept).

FlowLog Current Limitations (Work In Progress)

Aggregation
FlowLog currently supports count, sum, min, max aggregation operators. However, the aggregate field must be the last argument in the head IDB. All rules deriving the same IDB must conform to the same aggregation type (e.g. count, sum).

Compilation
FlowLog currently compiles very slowly due to heavy dependencies (e.g., DD/Timely). On r6525 node, a from-scratch release build can take ~16 minutes.

Arithmetic Head
Support for the Arithmetic Head feature is currently unstable and conflicts with the existing SIP optimization. We have therefore moved it to a temporary branch nemo_arithmetic. You can check out this branch to run programs that require this feature (e.g., SSSP). We have confirmed it runs correctly on SSSP, but we do not guarantee correctness in general.

FlowLog Example Benchmarks

The examples/ directory contains several sample Datalog programs demonstrating various features and use cases.

Reproducing Paper Figures

This repository includes FlowLog-Reproduction as a git submodule. You can use this submodule to reproduce the experiment figures from the paper. Please initialize submodules after cloning:

git submodule update --init --recursive

Contributing

Contributions are welcome! Feel free to submit a pull request or open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
.cargo		.cargo
FlowLog-Reproduction @ 66c88af		FlowLog-Reproduction @ 66c88af
examples		examples
src		src
tool		tool
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
architecture.png		architecture.png
flowlog-full.png		flowlog-full.png
flowlog.png		flowlog.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FlowLog Paper

FlowLog Architecture

Quick Example

Environment Setup

Write a Simple Program

Prepare Input Data

Build and Run

FlowLog Build

Execution Modes

Build Options

FlowLog Run

Command Options

Example Commands

Datasets

FlowLog (Datalog) Syntax

FlowLog Current Limitations (Work In Progress)

FlowLog Example Benchmarks

Reproducing Paper Figures

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

hdz284/FlowLog

Folders and files

Latest commit

History

Repository files navigation

FlowLog Paper

FlowLog Architecture

Quick Example

Environment Setup

Write a Simple Program

Prepare Input Data

Build and Run

FlowLog Build

Execution Modes

Build Options

FlowLog Run

Command Options

Example Commands

Datasets

FlowLog (Datalog) Syntax

FlowLog Current Limitations (Work In Progress)

FlowLog Example Benchmarks

Reproducing Paper Figures

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages