Skip to content
This repository has been archived by the owner on Jan 7, 2025. It is now read-only.

Commit

Permalink
docs: finish all docs
Browse files Browse the repository at this point in the history
Signed-off-by: Alex Chi <[email protected]>
  • Loading branch information
skyzh committed Dec 31, 2023
1 parent 45db7a9 commit d631fbf
Show file tree
Hide file tree
Showing 8 changed files with 79 additions and 4 deletions.
2 changes: 2 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@ The docs is written in `mdbook` format. You can follow the [`mdbook` installatio
```shell
mdbook serve
```

The online version of the documentation can be found at [https://cmu-db.github.io/optd/](https://cmu-db.github.io/optd/).
6 changes: 3 additions & 3 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@

# Demo

- [(WIP) Three Join Demo](./demo_three_join.md)
- [(WIP) TPC-H Q8 Demo](./tpch_q8_demo.md)
- [Three Join Demo](./demo_three_join.md)
- [TPC-H Q8 Demo](./demo_tpch_q8.md)

# Testing

- [(WIP) SQLPlannerTest](./sqlplannertest.md)
- [SQLPlannerTest](./sqlplannertest.md)
- [(WIP) Datafusion CLI](./datafusion_cli.md)
12 changes: 12 additions & 0 deletions docs/src/datafusion_cli.md
Original file line number Diff line number Diff line change
@@ -1 +1,13 @@
# Datafusion CLI

Developers can interact with optd by using the Datafusion cli. The cli supports creating tables, populating data, and executing ANSI SQL queries.

```shell
cargo run --bin datafusion-optd-cli
```

We also have a scale 0.01 TPC-H dataset to test. The test SQL can be executed with the Datafusion cli.

```shell
cargo run --bin datafusion-optd-cli -- -f tpch/test.sql
```
30 changes: 30 additions & 0 deletions docs/src/demo_three_join.md
Original file line number Diff line number Diff line change
@@ -1 +1,31 @@
# Three Join Demo

You can run this demo with the following command:

```shell
cargo run --release --bin optd-adaptive-three-join
```

We create 3 tables and join them. The underlying data are getting updated every time the query is executed.

```sql
select * from t1, t2, t3 where t1v1 = t2v1 and t1v2 = t3v2;
```

When the data distribution and the table size changes, the optimal join order will be different. The output of this demo is as below.

```plain
Iter 66: (HashJoin (HashJoin t1 t2) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 66/66=100.000
Iter 67: (HashJoin (HashJoin t2 t1) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 66/67=98.507
Iter 68: (HashJoin t2 (HashJoin t1 t3)) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 66/68=97.059
Iter 69: (HashJoin (HashJoin t1 t2) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 67/69=97.101
Iter 70: (HashJoin (HashJoin t1 t2) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 68/70=97.143
Iter 71: (HashJoin (HashJoin t1 t2) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 69/71=97.183
Iter 72: (HashJoin (HashJoin t2 t1) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 69/72=95.833
```

The left plan Lisp representation is the join order determined by the adaptive query optimization algorithm. The right plan is the best plan. The accuracy is the percentage of executions that the adaptive query optimization algorithm generates the best cost-optimal plan.

To find the optimal plan and compute the accuracy, we set up two optimizers in this demo: the normal optimizer and the optimal optimizer. Each time we insert some data into the tables, we will invoke the normal optimizer once, and invoke the optimal optimizer with all possible combination of join orders, so that the optimal optimizer can produce an optimal plan based on the cost model and the join selectivity.

As the algorithm can only know the runtime information from last run before new data are added into the tables, there will be some iterations where it cannot generate the optimal plan. But it will converge to the optimal plan as more runtime information is collected.
26 changes: 26 additions & 0 deletions docs/src/demo_tpch_q8.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# TPC-H Q8 Demo


You can run this demo with the following command:

```shell
cargo run --release --bin optd-adaptive-tpch-q8
```

In this demo, we create the TPC-H schema with test data of scale 0.01. There are 8 tables in TPC-H Q8, and it is impossible to enumerate all join combinations in one run. The demo will run this query multiple times, each time exploring a subset of the plan space. Therefore, optimization will be fast for each iteration, and as the plan space is more explored in each iteration, the produced plan will converge to the optimal join order.

```plain
--- ITERATION 5 ---
plan space size budget used, not applying logical rules any more. current plan space: 10354
(HashJoin region (HashJoin (HashJoin (HashJoin (HashJoin (HashJoin part (HashJoin supplier lineitem)) orders) customer) nation) nation))
plan space size budget used, not applying logical rules any more. current plan space: 11743
+--------+------------+
| col0 | col1 |
+--------+------------+
| 1995.0 | 1.00000000 |
| 1996.0 | 0.32989690 |
+--------+------------+
2 rows in set. Query took 0.115 seconds.
```

The output contains the current join order in Lisp representation, the plan space, and the query result.
2 changes: 2 additions & 0 deletions docs/src/partial_exploration.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Partial Exploration

When the plan space is very large, optd will generate a sub-optimal plan at first, and then use the runtime information to continue the plan space search next time the same query (or a similar query) is being optimized. This is partial exploration.

Developers can pass `partial_explore_iter` and `partial_explore_space` to the optimizer options to specify how large the optimizer will expand each time `step_optimize_rel` is invoked. To use partial exploration, developers should not clear the internal state of the optimizer across different runs.
4 changes: 4 additions & 0 deletions docs/src/sqlplannertest.md
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
# SQLPlannerTest

optd uses risinglightdb's SQL planner test library to ensure the optimizer works correctly and stably produces an expected plan. SQL planner test is a regression test. Developers provide the test framework a yaml file with the queries to be optimized and the information they want to collect. The test framework generates the test result and store them in SQL files. When a developer submits a pull request, the reviewers should check if any of these outputs are changed unexpectedly.

The test cases can be found in `optd-sqlplannertest/tests`. Currently, we check if optd can enumerate all join orders by using the `explain:logical_join_orders,physical_plan` task and check if the query output is as expected by using the `execute` task.
1 change: 0 additions & 1 deletion docs/src/tpch_q8_demo.md

This file was deleted.

0 comments on commit d631fbf

Please sign in to comment.