paimon-bench-tools

This repo is organized by benchmark scenario hierarchy. The first implemented scenario is pk/upsert, which compares Apache Paimon primary-key upsert write behavior plus L0 compaction between Java and C++ under the same workload.

Scenario layout

java-bench/pom.xml: shared Java Maven build for Java scenarios.
java-bench/src/main/java/: shared Java source root following the normal Maven layout.
cpp-bench/pk/upsert/: scenario-specific C++ benchmark target.
benchmarks/pk/upsert/scripts/generate_workload.py: generates Parquet input files for the scenario from workload.properties.
benchmarks/pk/upsert/scripts/run_benchmark.py: runs the scenario and writes raw metrics plus a summary CSV.
workloads/pk/upsert/workload.properties: scenario-specific workload and tuning knobs with inline comments.

Tuning model

Each scenario owns its own tuning knobs in workload.properties.

catalog.option.*: catalog creation options shared by Java and C++.
table.option.*: table creation options shared by Java and C++.
java.option.*: Java-runner behavior for the scenario.
cpp.option.*: C++-runner behavior for the scenario.

Quick start

Install Python dependencies:

python3 -m pip install -r requirements.txt

Initialize the upstream paimon-cpp submodule and apply the local benchmark build patch:

git submodule update --init --recursive
./scripts/apply_paimon_cpp_patch.sh

Kill any leftover benchmark processes if needed:

./scripts/kill_running_benchmarks.sh

Generate the pk/upsert workload:

.venv/bin/python benchmarks/pk/upsert/scripts/generate_workload.py \
  --scenario-dir workloads/pk/upsert

Run the scenario:

.venv/bin/python benchmarks/pk/upsert/scripts/run_benchmark.py \
  --workload-dir workloads/pk/upsert

Run the scenario with live Grafana monitoring:

observability/start-monitoring.sh
observability/reset-pushgateway.sh
.venv/bin/python benchmarks/pk/upsert/scripts/run_benchmark.py \
  --workload-dir workloads/pk/upsert \
  --live-monitoring

The runner verifies that PushGateway is empty before the monitored run starts, then launches Java and C++ under the same run_id so the live dashboard can compare them side by side while both are running.

The live dashboard is available at http://localhost:3000 as Paimon Bench PK Upsert Compare.

Raw benchmark JSON and the CSV row now include native Paimon metrics under paimon_metrics / paimon_metrics_json, split by phase:

write
write_commit
compaction
compaction_commit

During a live-monitored run, both engines start a background collector that snapshots native Paimon metrics every second and pushes them directly to PushGateway for Grafana.

Flamegraphs

You can optionally profile a run and dump flamegraph artifacts into the run's result directory:

.venv/bin/python benchmarks/pk/upsert/scripts/run_benchmark.py \
  --workload-dir workloads/pk/upsert \
  --results-dir results/pk/upsert/profiled-run \
  --flamegraph

Useful knobs:

--flamegraph-frequency: perf sample frequency. Default is 99.
--flamegraph-tools-dir: directory containing flamegraph.pl and stackcollapse-perf.pl. If the directory does not exist, the runner will shallow-clone https://github.com/brendangregg/FlameGraph.git there automatically.
--async-profiler-dir: directory containing the async-profiler package for Java profiling. If the directory does not exist, the runner will auto-download it from https://github.com/async-profiler/async-profiler.

Artifacts are written under results/.../flamegraph/:

java.flamegraph.html
cpp.perf.data
cpp.perf.script
cpp.stacks.folded
cpp.flamegraph.svg
cpp.flamegraph.html

Implementation note:

Java flamegraphs are produced directly by async-profiler as interactive HTML.
C++ flamegraphs are produced from perf record plus Brendan Gregg's FlameGraph tools.

Linux note: perf record must be permitted for the current user. If the host has a restrictive kernel.perf_event_paranoid setting, the runner will fail fast before a C++ profiled run starts.

Results

The scenario runner writes outputs under results/pk/upsert/.

results/pk/upsert/benchmark_metrics.csv: one row per engine run.

Important note about C++

The upstream C++ source now lives as a git submodule at external/paimon-cpp, pinned to an upstream commit. Benchmark-specific changes are stored separately in patches/paimon-cpp/0001-build-as-subproject-disable-arrow-brotli.patch and can be reapplied with scripts/apply_paimon_cpp_patch.sh.

Upstream paimon-cpp does not normally allow PK table commit. The pk/upsert scenario uses the same hidden integration-test flags used by external/paimon-cpp/test/inte/pk_compaction_inte_test.cpp:

enable-pk-commit-in-inte-test
enable-object-store-commit-in-inte-test

That keeps the benchmark aligned with upstream test coverage, but it also means the current C++ PK upsert plus L0 compaction path is intentionally exercising test-gated behavior rather than a generally supported production path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

paimon-bench-tools

Scenario layout

Tuning model

Quick start

Flamegraphs

Results

Important note about C++

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
benchmarks/pk/upsert/scripts		benchmarks/pk/upsert/scripts
cpp-bench/pk/upsert		cpp-bench/pk/upsert
external		external
java-bench		java-bench
observability		observability
patches/paimon-cpp		patches/paimon-cpp
scripts		scripts
skill		skill
workloads/pk/upsert		workloads/pk/upsert
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

paimon-bench-tools

Scenario layout

Tuning model

Quick start

Flamegraphs

Results

Important note about C++

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages