From ba22fa14cccf7cc52266b32817ff2e3022c268a6 Mon Sep 17 00:00:00 2001 From: Alex Chi Z Date: Tue, 16 Jan 2024 15:45:20 +0800 Subject: [PATCH] docs: add misc Signed-off-by: Alex Chi Z --- docs/src/SUMMARY.md | 4 ++++ docs/src/miscellaneous.md | 31 +++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) create mode 100644 docs/src/miscellaneous.md diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md index bab437d5..21b770c4 100644 --- a/docs/src/SUMMARY.md +++ b/docs/src/SUMMARY.md @@ -26,3 +26,7 @@ - [SQLPlannerTest](./sqlplannertest.md) - [Datafusion CLI](./datafusion_cli.md) + +# Miscellaneous + +- [Miscellaneous](./miscellaneous.md) diff --git a/docs/src/miscellaneous.md b/docs/src/miscellaneous.md new file mode 100644 index 00000000..296961a6 --- /dev/null +++ b/docs/src/miscellaneous.md @@ -0,0 +1,31 @@ +# Miscellaneous + +This is a note covering things that do not work well in the system right now. + +## Type System + +Currently, we hard code decimal type to have `15, 2` precision. Type inferences should be done in the schema property inference. + +## Expression + +optd supports exploring SQL expressions in the optimization process. However, this might be super inefficient as optimizing a plan node (i.e., join to hash join) usually needs the full binding of an expression tree. This could have exponential plan space and is super inefficient. + +## Bindings + +We do not have something like a binding iterator as in the Cascades paper. Before applying a rule, we will generate all bindings of a group, which might take a lot of memory. This should be fixed in the future. + +## Cycle Detection + +Consider the case for join commute rule. + +``` +(Join A B) <- group 1 +(Projection (Join B A) ) <- group 2 +(Projection (Projection (Join A B) ) ) <- group 1 may refer itself +``` + +After applying the rule twice, the memo table will have self-referential groups. Currently, we detect such self-referential things in optimize group task. Probably there will be better ways to do that. + +## Partial Exploration + +Each iteration will only be slower because we have to invoke the optimize group tasks before we can find a group to apply the rule. Probably we can keep the task stack across runs to make it faster.