Skip to content
This repository was archived by the owner on Jan 7, 2025. It is now read-only.

Commit d631fbf

Browse files
committed
docs: finish all docs
Signed-off-by: Alex Chi <[email protected]>
1 parent 45db7a9 commit d631fbf

File tree

8 files changed

+79
-4
lines changed

8 files changed

+79
-4
lines changed

docs/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,5 @@ The docs is written in `mdbook` format. You can follow the [`mdbook` installatio
55
```shell
66
mdbook serve
77
```
8+
9+
The online version of the documentation can be found at [https://cmu-db.github.io/optd/](https://cmu-db.github.io/optd/).

docs/src/SUMMARY.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,10 @@
1919

2020
# Demo
2121

22-
- [(WIP) Three Join Demo](./demo_three_join.md)
23-
- [(WIP) TPC-H Q8 Demo](./tpch_q8_demo.md)
22+
- [Three Join Demo](./demo_three_join.md)
23+
- [TPC-H Q8 Demo](./demo_tpch_q8.md)
2424

2525
# Testing
2626

27-
- [(WIP) SQLPlannerTest](./sqlplannertest.md)
27+
- [SQLPlannerTest](./sqlplannertest.md)
2828
- [(WIP) Datafusion CLI](./datafusion_cli.md)

docs/src/datafusion_cli.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,13 @@
11
# Datafusion CLI
2+
3+
Developers can interact with optd by using the Datafusion cli. The cli supports creating tables, populating data, and executing ANSI SQL queries.
4+
5+
```shell
6+
cargo run --bin datafusion-optd-cli
7+
```
8+
9+
We also have a scale 0.01 TPC-H dataset to test. The test SQL can be executed with the Datafusion cli.
10+
11+
```shell
12+
cargo run --bin datafusion-optd-cli -- -f tpch/test.sql
13+
```

docs/src/demo_three_join.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,31 @@
11
# Three Join Demo
2+
3+
You can run this demo with the following command:
4+
5+
```shell
6+
cargo run --release --bin optd-adaptive-three-join
7+
```
8+
9+
We create 3 tables and join them. The underlying data are getting updated every time the query is executed.
10+
11+
```sql
12+
select * from t1, t2, t3 where t1v1 = t2v1 and t1v2 = t3v2;
13+
```
14+
15+
When the data distribution and the table size changes, the optimal join order will be different. The output of this demo is as below.
16+
17+
```plain
18+
Iter 66: (HashJoin (HashJoin t1 t2) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 66/66=100.000
19+
Iter 67: (HashJoin (HashJoin t2 t1) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 66/67=98.507
20+
Iter 68: (HashJoin t2 (HashJoin t1 t3)) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 66/68=97.059
21+
Iter 69: (HashJoin (HashJoin t1 t2) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 67/69=97.101
22+
Iter 70: (HashJoin (HashJoin t1 t2) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 68/70=97.143
23+
Iter 71: (HashJoin (HashJoin t1 t2) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 69/71=97.183
24+
Iter 72: (HashJoin (HashJoin t2 t1) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 69/72=95.833
25+
```
26+
27+
The left plan Lisp representation is the join order determined by the adaptive query optimization algorithm. The right plan is the best plan. The accuracy is the percentage of executions that the adaptive query optimization algorithm generates the best cost-optimal plan.
28+
29+
To find the optimal plan and compute the accuracy, we set up two optimizers in this demo: the normal optimizer and the optimal optimizer. Each time we insert some data into the tables, we will invoke the normal optimizer once, and invoke the optimal optimizer with all possible combination of join orders, so that the optimal optimizer can produce an optimal plan based on the cost model and the join selectivity.
30+
31+
As the algorithm can only know the runtime information from last run before new data are added into the tables, there will be some iterations where it cannot generate the optimal plan. But it will converge to the optimal plan as more runtime information is collected.

docs/src/demo_tpch_q8.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# TPC-H Q8 Demo
2+
3+
4+
You can run this demo with the following command:
5+
6+
```shell
7+
cargo run --release --bin optd-adaptive-tpch-q8
8+
```
9+
10+
In this demo, we create the TPC-H schema with test data of scale 0.01. There are 8 tables in TPC-H Q8, and it is impossible to enumerate all join combinations in one run. The demo will run this query multiple times, each time exploring a subset of the plan space. Therefore, optimization will be fast for each iteration, and as the plan space is more explored in each iteration, the produced plan will converge to the optimal join order.
11+
12+
```plain
13+
--- ITERATION 5 ---
14+
plan space size budget used, not applying logical rules any more. current plan space: 10354
15+
(HashJoin region (HashJoin (HashJoin (HashJoin (HashJoin (HashJoin part (HashJoin supplier lineitem)) orders) customer) nation) nation))
16+
plan space size budget used, not applying logical rules any more. current plan space: 11743
17+
+--------+------------+
18+
| col0 | col1 |
19+
+--------+------------+
20+
| 1995.0 | 1.00000000 |
21+
| 1996.0 | 0.32989690 |
22+
+--------+------------+
23+
2 rows in set. Query took 0.115 seconds.
24+
```
25+
26+
The output contains the current join order in Lisp representation, the plan space, and the query result.

docs/src/partial_exploration.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
# Partial Exploration
22

33
When the plan space is very large, optd will generate a sub-optimal plan at first, and then use the runtime information to continue the plan space search next time the same query (or a similar query) is being optimized. This is partial exploration.
4+
5+
Developers can pass `partial_explore_iter` and `partial_explore_space` to the optimizer options to specify how large the optimizer will expand each time `step_optimize_rel` is invoked. To use partial exploration, developers should not clear the internal state of the optimizer across different runs.

docs/src/sqlplannertest.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,5 @@
11
# SQLPlannerTest
2+
3+
optd uses risinglightdb's SQL planner test library to ensure the optimizer works correctly and stably produces an expected plan. SQL planner test is a regression test. Developers provide the test framework a yaml file with the queries to be optimized and the information they want to collect. The test framework generates the test result and store them in SQL files. When a developer submits a pull request, the reviewers should check if any of these outputs are changed unexpectedly.
4+
5+
The test cases can be found in `optd-sqlplannertest/tests`. Currently, we check if optd can enumerate all join orders by using the `explain:logical_join_orders,physical_plan` task and check if the query output is as expected by using the `execute` task.

docs/src/tpch_q8_demo.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)