You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix sort for unpartitioned window with order by clause
There is an optimization to remove sorts if the data is already sorted and
single node. This applies if there is an unpartitioned window function with
an order by in the window function and the query has a general order by on the same columns.
However, this optimization was being applied even if there was a local exchange
before the sort that repartitioned (and therefore reordered) the data. In
particular, this kind of plan occurs if there is a filter or project after a
window function. This PR makes removing sorts aware of local exchanges and does not
remove a sort when a local exchange changes the ordering of the data.
Example affected query:
SELECT regionkey, count(name) OVER (order by regionkey)
FROM region
ORDER BY regionkey;
Previous plan:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Output[regionkey, _col1] => [regionkey:bigint, count:bigint]
_col1 := count (1:27)
- Project[projectLocality = LOCAL] => [regionkey:bigint, count:bigint]
- LocalExchange[ROUND_ROBIN] () => [regionkey:bigint, name:varchar(25), count:bigint]
- Window[order by (regionkey ASC_NULLS_LAST)] => [regionkey:bigint, name:varchar(25), count:bigint]
count := count(name) RANGE UNBOUNDED_PRECEDING CURRENT_ROW (1:27)
- LocalExchange[SINGLE] () => [regionkey:bigint, name:varchar(25)]
Estimates: {rows: 5 (104B), cpu: 104.00, memory: 0.00, network: 104.00}
- RemoteStreamingExchange[GATHER] => [regionkey:bigint, name:varchar(25)]
Estimates: {rows: 5 (104B), cpu: 104.00, memory: 0.00, network: 104.00}
- TableScan[TableHandle {connectorId='tpch', connectorHandle='region:sf1.0', layout='Optional[region:sf1.0]'}] => [regionkey:bigint, name:varchar(25)]
Estimates: {rows: 5 (104B), cpu: 104.00, memory: 0.00, network: 0.00}
name := tpch:name (1:70)
regionkey := tpch:regionkey (1:70)
New Plan:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Output[regionkey, _col1] => [regionkey:bigint, count:bigint]
_col1 := count (1:27)
- LocalMerge[regionkey ASC_NULLS_LAST] => [regionkey:bigint, count:bigint]
- Sort[regionkey ASC_NULLS_LAST] => [regionkey:bigint, count:bigint]
- Project[projectLocality = LOCAL] => [regionkey:bigint, count:bigint]
- LocalExchange[ROUND_ROBIN] () => [regionkey:bigint, name:varchar(25), count:bigint]
- Window[order by (regionkey ASC_NULLS_LAST)] => [regionkey:bigint, name:varchar(25), count:bigint]
count := count(name) RANGE UNBOUNDED_PRECEDING CURRENT_ROW (1:27)
- LocalExchange[SINGLE] () => [regionkey:bigint, name:varchar(25)]
Estimates: {rows: 5 (104B), cpu: 104.00, memory: 0.00, network: 104.00}
- RemoteStreamingExchange[GATHER] => [regionkey:bigint, name:varchar(25)]
Estimates: {rows: 5 (104B), cpu: 104.00, memory: 0.00, network: 104.00}
- TableScan[TableHandle {connectorId='tpch', connectorHandle='region:sf1.0', layout='Optional[region:sf1.0]'}] => [regionkey:bigint, name:varchar(25)]
Estimates: {rows: 5 (104B), cpu: 104.00, memory: 0.00, network: 0.00}
name := tpch:name (2:6)
regionkey := tpch:regionkey (2:6)
0 commit comments