This repository was archived by the owner on Jan 7, 2025. It is now read-only.
Commit 1c557a4
authored
feat: integrate stats in to optd (#117)
Generate the statistics in perftest and put them into `BaseCostModel` in
`DatafusionOptimizer`. Below is the comparison before & after stats are
added. You can check `PhysicalScan`, where the cost has changed.
The final cardinality remains the same because when stats on a column is
missing, we use a very small magic number `INVALID_SELECTIVITY` (0.001)
that just sets cardinality to 1.
### Todos in Future PRs
- Support generating stats on `Utf8`.
- Set a better magic number.
- Generate MCV.
### Before
```
plan space size budget used, not applying logical rules any more. current plan space: 1094
explain: PhysicalSort
├── exprs:SortOrder { order: Desc }
│ └── #1
├── cost: weighted=185.17,row_cnt=1.00,compute=179.17,io=6.00
└── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=182.12,row_cnt=1.00,compute=176.12,io=6.00 }
└── PhysicalAgg
├── aggrs:Agg(Sum)
│ └── Mul
│ ├── #0
│ └── Sub
│ ├── 1
│ └── #1
├── groups: [ #2 ]
├── cost: weighted=182.02,row_cnt=1.00,compute=176.02,io=6.00
└── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=64.90,row_cnt=1.00,compute=58.90,io=6.00 }
└── PhysicalProjection { exprs: [ #0, #1, #4, #5, #6 ], cost: weighted=64.76,row_cnt=1.00,compute=58.76,io=6.00 }
└── PhysicalProjection { exprs: [ #2, #3, #5, #6, #7, #8, #9 ], cost: weighted=64.54,row_cnt=1.00,compute=58.54,io=6.00 }
└── PhysicalProjection { exprs: [ #0, #3, #4, #5, #6, #7, #8, #9, #10, #11 ], cost: weighted=64.24,row_cnt=1.00,compute=58.24,io=6.00 }
└── PhysicalProjection { exprs: [ #1, #2, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13 ], cost: weighted=63.82,row_cnt=1.00,compute=57.82,io=6.00 }
└── PhysicalProjection { exprs: [ #0, #3, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #18, #19 ], cost: weighted=63.32,row_cnt=1.00,compute=57.32,io=6.00 }
└── PhysicalNestedLoopJoin
├── join_type: Inner
├── cond:And
│ ├── Eq
│ │ ├── #11
│ │ └── #14
│ └── Eq
│ ├── #3
│ └── #15
├── cost: weighted=62.74,row_cnt=1.00,compute=56.74,io=6.00
├── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #1 ], cost: weighted=35.70,row_cnt=1.00,compute=32.70,io=3.00 }
│ ├── PhysicalScan { table: customer, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
│ └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #0 ], cost: weighted=31.64,row_cnt=1.00,compute=29.64,io=2.00 }
│ ├── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=27.40,row_cnt=1.00,compute=26.40,io=1.00 }
│ │ └── PhysicalFilter
│ │ ├── cond:And
│ │ │ ├── Geq
│ │ │ │ ├── #2
│ │ │ │ └── 9131
│ │ │ └── Lt
│ │ │ ├── #2
│ │ │ └── 9496
│ │ ├── cost: weighted=27.30,row_cnt=1.00,compute=26.30,io=1.00
│ │ └── PhysicalProjection { exprs: [ #0, #1, #4 ], cost: weighted=1.14,row_cnt=1.00,compute=0.14,io=1.00 }
│ │ └── PhysicalScan { table: orders, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
│ └── PhysicalProjection { exprs: [ #0, #2, #5, #6 ], cost: weighted=1.18,row_cnt=1.00,compute=0.18,io=1.00 }
│ └── PhysicalScan { table: lineitem, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
└── PhysicalProjection { exprs: [ #0, #3, #7, #8, #9, #10 ], cost: weighted=15.72,row_cnt=1.00,compute=12.72,io=3.00 }
└── PhysicalHashJoin { join_type: Inner, left_keys: [ #3 ], right_keys: [ #0 ], cost: weighted=15.46,row_cnt=1.00,compute=12.46,io=3.00 }
├── PhysicalScan { table: supplier, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
└── PhysicalHashJoin { join_type: Inner, left_keys: [ #2 ], right_keys: [ #0 ], cost: weighted=11.40,row_cnt=1.00,compute=9.40,io=2.00 }
├── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=1.14,row_cnt=1.00,compute=0.14,io=1.00 }
│ └── PhysicalScan { table: nation, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
└── PhysicalProjection { exprs: [ #0 ], cost: weighted=7.20,row_cnt=1.00,compute=6.20,io=1.00 }
└── PhysicalFilter
├── cond:Eq
│ ├── #1
│ └── "AMERICA"
├── cost: weighted=7.14,row_cnt=1.00,compute=6.14,io=1.00
└── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=1.10,row_cnt=1.00,compute=0.10,io=1.00 }
└── PhysicalScan { table: region, cost: weighted=1.00,row_cnt=1.00,compute=0.00,io=1.00 }
plan space size budget used, not applying logical rules any more. current plan space: 1094
qerrors: {"DataFusion": [5.0]}
```
### After
```
plan space size budget used, not applying logical rules any more. current plan space: 1094
explain: PhysicalSort
├── exprs:SortOrder { order: Desc }
│ └── #1
├── cost: weighted=336032.32,row_cnt=1.00,compute=259227.32,io=76805.00
└── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=336029.27,row_cnt=1.00,compute=259224.27,io=76805.00 }
└── PhysicalAgg
├── aggrs:Agg(Sum)
│ └── Mul
│ ├── #0
│ └── Sub
│ ├── 1
│ └── #1
├── groups: [ #2 ]
├── cost: weighted=336029.17,row_cnt=1.00,compute=259224.17,io=76805.00
└── PhysicalProjection { exprs: [ #0, #1, #2 ], cost: weighted=335912.05,row_cnt=1.00,compute=259107.05,io=76805.00 }
└── PhysicalProjection { exprs: [ #0, #1, #4, #5, #6 ], cost: weighted=335911.91,row_cnt=1.00,compute=259106.91,io=76805.00 }
└── PhysicalProjection { exprs: [ #2, #3, #5, #6, #7, #8, #9 ], cost: weighted=335911.69,row_cnt=1.00,compute=259106.69,io=76805.00 }
└── PhysicalProjection { exprs: [ #0, #3, #4, #5, #6, #7, #8, #9, #10, #11 ], cost: weighted=335911.39,row_cnt=1.00,compute=259106.39,io=76805.00 }
└── PhysicalProjection { exprs: [ #1, #2, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13 ], cost: weighted=335910.97,row_cnt=1.00,compute=259105.97,io=76805.00 }
└── PhysicalProjection { exprs: [ #0, #3, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #18, #19 ], cost: weighted=335910.47,row_cnt=1.00,compute=259105.47,io=76805.00 }
└── PhysicalNestedLoopJoin
├── join_type: Inner
├── cond:And
│ ├── Eq
│ │ ├── #11
│ │ └── #14
│ └── Eq
│ ├── #3
│ └── #15
├── cost: weighted=335909.89,row_cnt=1.00,compute=259104.89,io=76805.00
├── PhysicalProjection { exprs: [ #6, #7, #8, #9, #10, #11, #12, #13, #0, #1, #2, #3, #4, #5 ], cost: weighted=335619.21,row_cnt=1.00,compute=258944.21,io=76675.00 }
│ └── PhysicalHashJoin { join_type: Inner, left_keys: [ #1 ], right_keys: [ #0 ], cost: weighted=335618.63,row_cnt=1.00,compute=258943.63,io=76675.00 }
│ ├── PhysicalProjection { exprs: [ #4, #5, #0, #1, #2, #3 ], cost: weighted=332616.57,row_cnt=1.00,compute=257441.57,io=75175.00 }
│ │ └── PhysicalProjection { exprs: [ #0, #2, #5, #6, #16, #17 ], cost: weighted=332616.31,row_cnt=1.00,compute=257441.31,io=75175.00 }
│ │ └── PhysicalProjection { exprs: [ #2, #3, #4, #5, #6, #7, #8, #9, #10, #11, #12, #13, #14, #15, #16, #17, #0, #1 ], cost: weighted=332616.05,row_cnt=1.00,compute=257441.05,io=75175.00 }
│ │ └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #0 ], cost: weighted=332615.31,row_cnt=1.00,compute=257440.31,io=75175.00 }
│ │ ├── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=212263.25,row_cnt=1.00,compute=197263.25,io=15000.00 }
│ │ │ └── PhysicalFilter
│ │ │ ├── cond:And
│ │ │ │ ├── Geq
│ │ │ │ │ ├── #2
│ │ │ │ │ └── 9131
│ │ │ │ └── Lt
│ │ │ │ ├── #2
│ │ │ │ └── 9496
│ │ │ ├── cost: weighted=212263.15,row_cnt=1.00,compute=197263.15,io=15000.00
│ │ │ └── PhysicalProjection { exprs: [ #0, #1, #4 ], cost: weighted=16050.07,row_cnt=15000.00,compute=1050.07,io=15000.00 }
│ │ │ └── PhysicalScan { table: orders, cost: weighted=15000.00,row_cnt=15000.00,compute=0.00,io=15000.00 }
│ │ └── PhysicalScan { table: lineitem, cost: weighted=60175.00,row_cnt=60175.00,compute=0.00,io=60175.00 }
│ └── PhysicalScan { table: customer, cost: weighted=1500.00,row_cnt=1500.00,compute=0.00,io=1500.00 }
└── PhysicalProjection { exprs: [ #0, #3, #7, #8, #9, #10 ], cost: weighted=279.36,row_cnt=1.00,compute=149.36,io=130.00 }
└── PhysicalProjection { exprs: [ #4, #5, #6, #7, #8, #9, #10, #0, #1, #2, #3 ], cost: weighted=279.10,row_cnt=1.00,compute=149.10,io=130.00 }
└── PhysicalProjection { exprs: [ #1, #2, #3, #0, #4, #5, #6, #7, #8, #9, #10 ], cost: weighted=278.64,row_cnt=1.00,compute=148.64,io=130.00 }
└── PhysicalHashJoin { join_type: Inner, left_keys: [ #1 ], right_keys: [ #3 ], cost: weighted=278.18,row_cnt=1.00,compute=148.18,io=130.00 }
├── PhysicalProjection { exprs: [ #3, #0, #1, #2 ], cost: weighted=76.12,row_cnt=1.00,compute=46.12,io=30.00 }
│ └── PhysicalProjection { exprs: [ #0, #1, #2, #4 ], cost: weighted=75.94,row_cnt=1.00,compute=45.94,io=30.00 }
│ └── PhysicalProjection { exprs: [ #1, #2, #3, #4, #0 ], cost: weighted=75.76,row_cnt=1.00,compute=45.76,io=30.00 }
│ └── PhysicalHashJoin { join_type: Inner, left_keys: [ #0 ], right_keys: [ #2 ], cost: weighted=75.54,row_cnt=1.00,compute=45.54,io=30.00 }
│ ├── PhysicalProjection { exprs: [ #0 ], cost: weighted=23.48,row_cnt=1.00,compute=18.48,io=5.00 }
│ │ └── PhysicalFilter
│ │ ├── cond:Eq
│ │ │ ├── #1
│ │ │ └── "AMERICA"
│ │ ├── cost: weighted=23.42,row_cnt=1.00,compute=18.42,io=5.00
│ │ └── PhysicalProjection { exprs: [ #0, #1 ], cost: weighted=5.30,row_cnt=5.00,compute=0.30,io=5.00 }
│ │ └── PhysicalScan { table: region, cost: weighted=5.00,row_cnt=5.00,compute=0.00,io=5.00 }
│ └── PhysicalScan { table: nation, cost: weighted=25.00,row_cnt=25.00,compute=0.00,io=25.00 }
└── PhysicalScan { table: supplier, cost: weighted=100.00,row_cnt=100.00,compute=0.00,io=100.00 }
plan space size budget used, not applying logical rules any more. current plan space: 1094
qerrors: {"DataFusion": [5.0]}
```1 parent 7915fb9 commit 1c557a4
File tree
9 files changed
+324
-40
lines changed- optd-datafusion-repr/src/cost
- optd-gungnir/src/stats
- optd-perftest
- src
9 files changed
+324
-40
lines changedSome generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
54 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
55 | 57 | | |
56 | 58 | | |
57 | 59 | | |
58 | | - | |
59 | | - | |
60 | 60 | | |
61 | | - | |
62 | | - | |
63 | 61 | | |
64 | 62 | | |
65 | 63 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
9 | 15 | | |
10 | 16 | | |
11 | 17 | | |
12 | 18 | | |
13 | 19 | | |
14 | 20 | | |
| 21 | + | |
| 22 | + | |
15 | 23 | | |
16 | 24 | | |
17 | 25 | | |
| |||
34 | 42 | | |
35 | 43 | | |
36 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
37 | 79 | | |
38 | 80 | | |
39 | | - | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
40 | 246 | | |
41 | 247 | | |
42 | 248 | | |
| |||
45 | 251 | | |
46 | 252 | | |
47 | 253 | | |
48 | | - | |
| 254 | + | |
49 | 255 | | |
50 | 256 | | |
51 | 257 | | |
| |||
445 | 651 | | |
446 | 652 | | |
447 | 653 | | |
448 | | - | |
| 654 | + | |
| 655 | + | |
449 | 656 | | |
450 | 657 | | |
451 | 658 | | |
| |||
484 | 691 | | |
485 | 692 | | |
486 | 693 | | |
487 | | - | |
| 694 | + | |
| 695 | + | |
488 | 696 | | |
489 | 697 | | |
490 | 698 | | |
| |||
555 | 763 | | |
556 | 764 | | |
557 | 765 | | |
558 | | - | |
| 766 | + | |
559 | 767 | | |
560 | 768 | | |
561 | 769 | | |
| |||
566 | 774 | | |
567 | 775 | | |
568 | 776 | | |
569 | | - | |
| 777 | + | |
570 | 778 | | |
571 | 779 | | |
572 | 780 | | |
| |||
612 | 820 | | |
613 | 821 | | |
614 | 822 | | |
615 | | - | |
| 823 | + | |
616 | 824 | | |
617 | 825 | | |
618 | 826 | | |
| |||
664 | 872 | | |
665 | 873 | | |
666 | 874 | | |
667 | | - | |
| 875 | + | |
668 | 876 | | |
669 | 877 | | |
670 | 878 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
| 12 | + | |
11 | 13 | | |
12 | 14 | | |
13 | 15 | | |
14 | 16 | | |
15 | 17 | | |
16 | 18 | | |
17 | 19 | | |
| 20 | + | |
18 | 21 | | |
19 | 22 | | |
20 | 23 | | |
| |||
29 | 32 | | |
30 | 33 | | |
31 | 34 | | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
32 | 42 | | |
33 | 43 | | |
34 | 44 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
| 10 | + | |
9 | 11 | | |
| 12 | + | |
10 | 13 | | |
11 | 14 | | |
12 | 15 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| |||
0 commit comments