refactor(hash-aggr): Migrate ordered partial/final aggregation by 2010YOUY01 · Pull Request #23181 · apache/datafusion

2010YOUY01 · 2026-06-25T07:56:28Z

Which issue does this PR close?

Closes #.

Rationale for this change

Part of #22710

This PRs implements the cases that input is ordered by group keys.

Comments at datafusion/physical-plan/src/aggregates/ordered_partial_stream.rs explains the high-level idea.

What changes are included in this PR?

Implement two aggregate tables for ordered partial/final aggregates. It provides a simple abstraction to the control flow: its logically a map from group keys to group states, and internally handles the low-level details
Implement two streams for ordered partial/final aggregates

Are these changes tested?

Are there any user-facing changes?

2010YOUY01 · 2026-06-25T07:58:08Z

    assert!(collected_running.len() > 2);
-    // Running should produce more chunk than the usual AggregateExec.
-    // Otherwise it means that we cannot generate result in running mode.
-    assert!(collected_running.len() > collected_usual.len());


This is asserting: we run the same query on OrderedAggregateStream and AggregateStream, the first one should return more number of batches.

This is implementation dependent, and later it will compare the whole result row-by-row, so it's safe to delete

2010YOUY01 · 2026-06-25T08:00:03Z

+/// `k = 100`, it is safe to emit all groups with keys less than 100 because the
+/// input is ordered.
+///
+/// ## Implementation Note


It's obvious there are many applicable optimizations for this path, here is the explanation why this PR tends to keep it simple.

2010YOUY01 · 2026-06-25T08:05:42Z

+    }
+
+    #[tokio::test]
+    async fn ordered_partial_aggregate_partially_sorted_no_emit_panic() -> Result<()> {


This test case is migrated from row_hash.rs's existing UT, and there is a comment left at the original test, so we can check it easier when deleting the old implementation eventually.

refactor(hash-aggr): Migrate ordered partial/final aggregation

0173f80

github-actions Bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Jun 25, 2026

2010YOUY01 commented Jun 25, 2026

View reviewed changes

2010YOUY01 mentioned this pull request Jun 25, 2026

[EPIC] Split Aggregation Logic into Dedicated Streams #22710

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(hash-aggr): Migrate ordered partial/final aggregation#23181

refactor(hash-aggr): Migrate ordered partial/final aggregation#23181
2010YOUY01 wants to merge 1 commit into
apache:mainfrom
2010YOUY01:split-aggr-ordered

2010YOUY01 commented Jun 25, 2026

Uh oh!

2010YOUY01 Jun 25, 2026

Uh oh!

2010YOUY01 Jun 25, 2026

Uh oh!

2010YOUY01 Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

2010YOUY01 commented Jun 25, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

2010YOUY01 Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant