Skip to content
This repository was archived by the owner on Jan 7, 2025. It is now read-only.

Commit f8f714c

Browse files
doc: docs for cardinality benchmarking (#183)
**Summary**: Updated `README.md`, `SUMMARY.md`, and a new file `cost_model_benchmarking.md` to document information about cardinality benchmarking. **Details**: * `README.md` contains a quickstart command. * `cost_model_benchmarking.md` contains conceptual info and notes about operating and extending the system. * I name it "benchmarking" instead of "testing" in the docs to distinguish it from functional testing. I renamed `perftest` and `cardtest` to `perfbench` and `cardbench` to match how we're calling it "benchmarking" instead of "testing".
1 parent 5255d8d commit f8f714c

19 files changed

+101
-54
lines changed

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,6 @@ members = [
77
"optd-sqlplannertest",
88
"optd-adaptive-demo",
99
"optd-gungnir",
10-
"optd-perftest",
10+
"optd-perfbench",
1111
]
1212
resolver = "2"

README.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ optd is a research project and is still evolving. It should not be used in produ
1212

1313
## Get Started
1414

15-
There are two demos you can run with optd. More information available in the [docs](docs/).
15+
There are three demos you can run with optd. More information available in the [docs](docs/).
1616

1717
```
1818
cargo run --release --bin optd-adaptive-tpch-q8
@@ -25,6 +25,13 @@ You can also run the Datafusion cli to interactively experiment with optd.
2525
cargo run --bin datafusion-optd-cli
2626
```
2727

28+
You can also test the performance of the cost model with the "cardinality benchmarking" feature (more info in the [docs](docs/)).
29+
Before running this, you will need to manually run Postgres on your machine.
30+
Note that there is a CI script which tests this command (TPC-H with scale factor 0.01) before every merge into main, so it should be very reliable.
31+
```
32+
cargo run --release --bin optd-perfbench cardbench tpch --scale-factor 0.01
33+
```
34+
2835
## Documentation
2936

3037
The documentation is available in the mdbook format in the [docs](docs) directory.
@@ -38,7 +45,7 @@ The documentation is available in the mdbook format in the [docs](docs) director
3845
* `optd-adaptive-demo`: Demo of adaptive optimization capabilities of optd. More information available in the [docs](docs/).
3946
* `optd-sqlplannertest`: Planner test of optd based on [risinglightdb/sqlplannertest-rs](https://github.com/risinglightdb/sqlplannertest-rs).
4047
* `optd-gungnir`: Scalable, memory-efficient, and parallelizable statistical methods for cardinality estimation (e.g. TDigest, HyperLogLog).
41-
* `optd-perftest`: A CLI program for testing performance (cardinality, throughput, etc.) against other databases.
48+
* `optd-perfbench`: A CLI program for benchmarking performance (cardinality, throughput, etc.) against other databases.
4249

4350

4451
# Related Works

dev_scripts/which_queries_work.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ fi
2424
successful_ids=()
2525
IFS=','
2626
for id in $all_ids; do
27-
cargo run --release --bin optd-perftest cardtest $benchmark_name --query-ids $id &>/dev/null
27+
cargo run --release --bin optd-perfbench cardbench $benchmark_name --query-ids $id &>/dev/null
2828

2929
if [ $? -eq 0 ]; then
3030
echo >&2 $id succeeded

docs/src/SUMMARY.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,10 @@
2222
- [Three Join Demo](./demo_three_join.md)
2323
- [TPC-H Q8 Demo](./demo_tpch_q8.md)
2424

25-
# Testing
25+
# Performance Benchmarking
26+
- [Cost Model Cardinality Benchmarking](./cost_model_benchmarking.md)
27+
28+
# Functional Testing
2629

2730
- [SQLPlannerTest](./sqlplannertest.md)
2831
- [Datafusion CLI](./datafusion_cli.md)

docs/src/cost_model_benchmarking.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Cost Model Cardinality Benchmarking
2+
3+
## Overview
4+
You can benchmark the cardinality estimates of optd's cost model against other DBMSs using the optd-perfbench module.
5+
6+
All aspects of benchmarking (except for setting up comparison DBMSs) are handled automatically. This includes loading workload data, building statistics, gathering the true cardinality of workload queries, running explains on workload queries, and aggregating cardinality estimation results.
7+
8+
We elected not to automate the installation and setup of the DBMS in order to accomodate the needs of all users. For instance, some users prefer installing Postgres on Homebrew, others choose to install the Mac application, while others wish to create a Postgres Docker container. However, it could be feasible in the future to standardize on Docker and automatically start a container. The only difficult part in that scenario is tuning Postgres/other DBMSs to the machine being run on, as this is currently done manually using PGTune.
9+
10+
Additionally, our system provides **fine-grained, robust caching** for every single step of the process. After the first run of a workload, all subsequent runs will *only require running explains*, which takes in a matter of seconds for all workloads. We use "acknowledgement files" to ensure that the caching is robust in that we never cache incomplete results.
11+
12+
## Basic Operation
13+
First, you need to manually install, configure, and start the DBMS(s) being compared against. Currently, only Postgres is supported. To see an example of how Postgres is installed, configured, and started on a Mac, check the `patrick/` folder in the [gungnir-experiments](https://github.com/wangpatrick57/gungnir-experiments) repository.
14+
15+
Once the DBMS(s) being compared against are set up, run this to quickly get started. It should take a few minutes on the first run and a few seconds on subsequent runs. This specific command that tests TPC-H with scale factor 0.01 is **run in a CI script** before every merge to main, so it should be very reliable.
16+
```
17+
cargo run --release --bin optd-perfbench cardbench tpch --scale-factor 0.01
18+
```
19+
20+
After this, you can try out different workloads and scale factors based on the CLI options.
21+
22+
Roughly speaking, there are two main ways the benchmarking system is used: (a) to compare the cardinality estimates of optd against another system *in aggregate* or (b) to investigate the cardinality estimates of a small subset of queries. The command above is for use case (a). The system automatically outputs a variety of *aggregate* information about the q-error including median, p95, max, and more. Additionally, the system outputs *comparative* information which shows the # of queries in which a given DBMS performs the best or is tied for the best.
23+
24+
For use case (b), you will want to set the `RUST_LOG` environment variable to `info` and use the `--query-ids` parameter. Setting `RUST_LOG` to `info` will show the results of the explain commands on all DBMSs and `--query-ids` will let you only run specific queries to avoid cluttering the output.
25+
```
26+
RUST_LOG=info cargo run --release --bin optd-perfbench cardbench tpch --scale-factor 0.01 --query-ids 2
27+
```
28+
29+
## Supporting More Queries
30+
Currently, we are missing support for a few queries in TPC-H, JOB, and JOB-light. An *approximate* list of supported queries can be found in the `[workload].rs` files (e.g. `tpch.rs` and `job.rs`). If `--query-ids` is ommitted from the command, we use the list of supported queries as defined in the `[workload].rs` file by default. Some of these queries are not supported by DataFusion, some by optd, and some because we run into an OOM error when trying to execute them on Postgres. Because of the last point, the set of supported queries may be different on different machines. The list of queries in `[workload].rs` (at least the one in `tpch.rs`) is tested to be working on the CI machine.
31+
32+
The *definitive* list of supported queries on your machine can be found by running `dev_scripts/which_queries_work.sh`, which simply runs the benchmarking system for each query individually. While this script does take a long time to complete when first run, it has the nice side effect of warming up all your caches so that subsequent runs are fast. The script outputs a string to replace the `WORKING_*QUERY_IDS` variable in `[workload].rs` as well as another string to use as the `--query-ids` argument. If you are use `which_queries_work.sh` to figure out the queries that work on your machine, you probably want to use `--query-ids` instead of setting `WORKING_*QUERY_IDS`.
33+
34+
If you add support for more queries, you will want to rerun `dev_scripts/which_queries_work.sh`. Since you are permanently adding support for more queries, you will want to update `WORKING_*QUERY_IDS`.
35+
36+
## Adding More DBMSs
37+
Currently, only Postgres is supported. Additional DBMSs can be easily added using the `CardbenchRunnerDBMSHelper` trait and optionally the `TruecardGetter` trait. `CardbenchRunnerDBMSHelper` must be implemented by all DBMSs that are supported because it has functions for gathering estimated cardinalities from DBMSs. `TruecardGetter` only needs to be implemented by at least one DBMS. The true cardinality should be the same across all DBMSs, so we only execute the queries for real on a single DBMS to drastically reduce benchmarking runtime. `TruecardGetter` is currently implemented for Postgres, so it is unnecessary to implement this for any other DBMS unless one wishes to improve the runtime of benchmarking (e.g. by gathering true cardinalities using an OLAP DBMS for OLAP workloads). Do keep in mind that true cardinalities are cached after the first run of a workload and can be shared across users (in the future, perhaps we'll even put the cached true cardinalities in the GitHub repository itself), so this optimization is not terribly important.

optd-perftest/Cargo.toml renamed to optd-perfbench/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[package]
2-
name = "optd-perftest"
2+
name = "optd-perfbench"
33
version = "0.1.0"
44
edition = "2021"
55

File renamed without changes.

optd-perftest/src/cardtest.rs renamed to optd-perfbench/src/cardbench.rs

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,12 @@ use anyhow::{self};
99
use async_trait::async_trait;
1010

1111
/// This struct performs cardinality testing across one or more DBMSs.
12-
/// Another design would be for the CardtestRunnerDBMSHelper trait to expose a function
12+
/// Another design would be for the CardbenchRunnerDBMSHelper trait to expose a function
1313
/// to evaluate the Q-error. However, I chose not to do this design for reasons
14-
/// described in the comments of the CardtestRunnerDBMSHelper trait. This is why
15-
/// you would use CardtestRunner even for computing the Q-error of a single DBMS.
16-
pub struct CardtestRunner {
17-
pub dbmss: Vec<Box<dyn CardtestRunnerDBMSHelper>>,
14+
/// described in the comments of the CardbenchRunnerDBMSHelper trait. This is why
15+
/// you would use CardbenchRunner even for computing the Q-error of a single DBMS.
16+
pub struct CardbenchRunner {
17+
pub dbmss: Vec<Box<dyn CardbenchRunnerDBMSHelper>>,
1818
truecard_getter: Box<dyn TruecardGetter>,
1919
}
2020

@@ -25,12 +25,12 @@ pub struct Cardinfo {
2525
pub truecard: usize,
2626
}
2727

28-
impl CardtestRunner {
28+
impl CardbenchRunner {
2929
pub async fn new(
30-
dbmss: Vec<Box<dyn CardtestRunnerDBMSHelper>>,
30+
dbmss: Vec<Box<dyn CardbenchRunnerDBMSHelper>>,
3131
truecard_getter: Box<dyn TruecardGetter>,
3232
) -> anyhow::Result<Self> {
33-
Ok(CardtestRunner {
33+
Ok(CardbenchRunner {
3434
dbmss,
3535
truecard_getter,
3636
})
@@ -57,7 +57,7 @@ impl CardtestRunner {
5757
.into_iter()
5858
.zip(truecards.iter())
5959
.map(|(estcard, &truecard)| Cardinfo {
60-
qerror: CardtestRunner::calc_qerror(estcard, truecard),
60+
qerror: CardbenchRunner::calc_qerror(estcard, truecard),
6161
estcard,
6262
truecard,
6363
})
@@ -90,8 +90,8 @@ impl CardtestRunner {
9090
/// When more performance tests are implemented, you would probably want to extract
9191
/// get_name() into a generic "DBMS" trait.
9292
#[async_trait]
93-
pub trait CardtestRunnerDBMSHelper {
94-
// get_name() has &self so that we're able to do Box<dyn CardtestRunnerDBMSHelper>
93+
pub trait CardbenchRunnerDBMSHelper {
94+
// get_name() has &self so that we're able to do Box<dyn CardbenchRunnerDBMSHelper>
9595
fn get_name(&self) -> &str;
9696

9797
// The order of queries in the returned vector has to be the same between all databases,
@@ -103,7 +103,7 @@ pub trait CardtestRunnerDBMSHelper {
103103
}
104104

105105
/// The core logic of cardinality testing.
106-
pub async fn cardtest_core<P: AsRef<Path>>(
106+
pub async fn cardbench_core<P: AsRef<Path>>(
107107
workspace_dpath: P,
108108
rebuild_cached_optd_stats: bool,
109109
pguser: &str,
@@ -115,10 +115,10 @@ pub async fn cardtest_core<P: AsRef<Path>>(
115115
let truecard_getter = pg_dbms.clone();
116116
let df_dbms =
117117
Box::new(DatafusionDBMS::new(&workspace_dpath, rebuild_cached_optd_stats, adaptive).await?);
118-
let dbmss: Vec<Box<dyn CardtestRunnerDBMSHelper>> = vec![pg_dbms, df_dbms];
118+
let dbmss: Vec<Box<dyn CardbenchRunnerDBMSHelper>> = vec![pg_dbms, df_dbms];
119119

120-
let mut cardtest_runner = CardtestRunner::new(dbmss, truecard_getter).await?;
121-
let cardinfos_alldbs = cardtest_runner
120+
let mut cardbench_runner = CardbenchRunner::new(dbmss, truecard_getter).await?;
121+
let cardinfos_alldbs = cardbench_runner
122122
.eval_benchmark_cardinfos_alldbs(&benchmark)
123123
.await?;
124124
Ok(cardinfos_alldbs)

optd-perftest/src/datafusion_dbms.rs renamed to optd-perfbench/src/datafusion_dbms.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ use std::{
77

88
use crate::{
99
benchmark::Benchmark,
10-
cardtest::CardtestRunnerDBMSHelper,
10+
cardbench::CardbenchRunnerDBMSHelper,
1111
job::{JobKit, JobKitConfig},
1212
tpch::{TpchKit, TpchKitConfig},
1313
};
@@ -47,7 +47,7 @@ const WITH_LOGICAL_FOR_TPCH: bool = true;
4747
const WITH_LOGICAL_FOR_JOB: bool = true;
4848

4949
#[async_trait]
50-
impl CardtestRunnerDBMSHelper for DatafusionDBMS {
50+
impl CardbenchRunnerDBMSHelper for DatafusionDBMS {
5151
fn get_name(&self) -> &str {
5252
"DataFusion"
5353
}

0 commit comments

Comments
 (0)