docs: finish all docs

skyzh · skyzh · commit d631fbf86552 · 2023-12-31T15:03:44.000+08:00
Signed-off-by: Alex Chi &lt;iskyzh@gmail.com&gt;
diff --git a/docs/README.md b/docs/README.md
@@ -5,3 +5,5 @@ The docs is written in `mdbook` format. You can follow the [`mdbook` installatio
 ```shell
 mdbook serve
 ```
+
+The online version of the documentation can be found at [https://cmu-db.github.io/optd/](https://cmu-db.github.io/optd/).
diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md
@@ -19,10 +19,10 @@
 
 # Demo
 
-- [(WIP) Three Join Demo](./demo_three_join.md)
-- [(WIP) TPC-H Q8 Demo](./tpch_q8_demo.md)
+- [Three Join Demo](./demo_three_join.md)
+- [TPC-H Q8 Demo](./demo_tpch_q8.md)
 
 # Testing
 
-- [(WIP) SQLPlannerTest](./sqlplannertest.md)
+- [SQLPlannerTest](./sqlplannertest.md)
 - [(WIP) Datafusion CLI](./datafusion_cli.md)
diff --git a/docs/src/datafusion_cli.md b/docs/src/datafusion_cli.md
@@ -1 +1,13 @@
 # Datafusion CLI
+
+Developers can interact with optd by using the Datafusion cli. The cli supports creating tables, populating data, and executing ANSI SQL queries.
+
+```shell
+cargo run --bin datafusion-optd-cli
+```
+
+We also have a scale 0.01 TPC-H dataset to test. The test SQL can be executed with the Datafusion cli.
+
+```shell
+cargo run --bin datafusion-optd-cli -- -f tpch/test.sql
+```
diff --git a/docs/src/demo_three_join.md b/docs/src/demo_three_join.md
@@ -1 +1,31 @@
 # Three Join Demo
+
+You can run this demo with the following command:
+
+```shell
+cargo run --release --bin optd-adaptive-three-join
+```
+
+We create 3 tables and join them. The underlying data are getting updated every time the query is executed.
+
+```sql
+select * from t1, t2, t3 where t1v1 = t2v1 and t1v2 = t3v2;
+```
+
+When the data distribution and the table size changes, the optimal join order will be different. The output of this demo is as below.
+
+```plain
+Iter  66: (HashJoin (HashJoin t1 t2) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 66/66=100.000
+Iter  67: (HashJoin (HashJoin t2 t1) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 66/67=98.507
+Iter  68: (HashJoin t2 (HashJoin t1 t3)) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 66/68=97.059
+Iter  69: (HashJoin (HashJoin t1 t2) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 67/69=97.101
+Iter  70: (HashJoin (HashJoin t1 t2) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 68/70=97.143
+Iter  71: (HashJoin (HashJoin t1 t2) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 69/71=97.183
+Iter  72: (HashJoin (HashJoin t2 t1) t3) <-> (best) (HashJoin (HashJoin t1 t2) t3), Accuracy: 69/72=95.833
+```
+
+The left plan Lisp representation is the join order determined by the adaptive query optimization algorithm. The right plan is the best plan. The accuracy is the percentage of executions that the adaptive query optimization algorithm generates the best cost-optimal plan.
+
+To find the optimal plan and compute the accuracy, we set up two optimizers in this demo: the normal optimizer and the optimal optimizer. Each time we insert some data into the tables, we will invoke the normal optimizer once, and invoke the optimal optimizer with all possible combination of join orders, so that the optimal optimizer can produce an optimal plan based on the cost model and the join selectivity.
+
+As the algorithm can only know the runtime information from last run before new data are added into the tables, there will be some iterations where it cannot generate the optimal plan. But it will converge to the optimal plan as more runtime information is collected.
diff --git a/docs/src/demo_tpch_q8.md b/docs/src/demo_tpch_q8.md
@@ -0,0 +1,26 @@
+# TPC-H Q8 Demo
+
+
+You can run this demo with the following command:
+
+```shell
+cargo run --release --bin optd-adaptive-tpch-q8
+```
+
+In this demo, we create the TPC-H schema with test data of scale 0.01. There are 8 tables in TPC-H Q8, and it is impossible to enumerate all join combinations in one run. The demo will run this query multiple times, each time exploring a subset of the plan space. Therefore, optimization will be fast for each iteration, and as the plan space is more explored in each iteration, the produced plan will converge to the optimal join order.
+
+```plain
+--- ITERATION 5 ---
+plan space size budget used, not applying logical rules any more. current plan space: 10354
+(HashJoin region (HashJoin (HashJoin (HashJoin (HashJoin (HashJoin part (HashJoin supplier lineitem)) orders) customer) nation) nation))
+plan space size budget used, not applying logical rules any more. current plan space: 11743
++--------+------------+
+| col0   | col1       |
++--------+------------+
+| 1995.0 | 1.00000000 |
+| 1996.0 | 0.32989690 |
++--------+------------+
+2 rows in set. Query took 0.115 seconds.
+```
+
+The output contains the current join order in Lisp representation, the plan space, and the query result.
diff --git a/docs/src/partial_exploration.md b/docs/src/partial_exploration.md
@@ -1,3 +1,5 @@
 # Partial Exploration
 
 When the plan space is very large, optd will generate a sub-optimal plan at first, and then use the runtime information to continue the plan space search next time the same query (or a similar query) is being optimized. This is partial exploration.
+
+Developers can pass `partial_explore_iter` and `partial_explore_space` to the optimizer options to specify how large the optimizer will expand each time `step_optimize_rel` is invoked. To use partial exploration, developers should not clear the internal state of the optimizer across different runs.
diff --git a/docs/src/sqlplannertest.md b/docs/src/sqlplannertest.md
@@ -1 +1,5 @@
 # SQLPlannerTest
+
+optd uses risinglightdb's SQL planner test library to ensure the optimizer works correctly and stably produces an expected plan. SQL planner test is a regression test. Developers provide the test framework a yaml file with the queries to be optimized and the information they want to collect. The test framework generates the test result and store them in SQL files. When a developer submits a pull request, the reviewers should check if any of these outputs are changed unexpectedly.
+
+The test cases can be found in `optd-sqlplannertest/tests`. Currently, we check if optd can enumerate all join orders by using the `explain:logical_join_orders,physical_plan` task and check if the query output is as expected by using the `execute` task.
diff --git a/docs/src/tpch_q8_demo.md b/docs/src/tpch_q8_demo.md