Performance 18254756429: Improve hash grouping aggregation parallelism #2729

alexowens90 · 2025-10-23T15:35:30Z

Reference Issues/PRs

What does this implement or fix?

Poor quality hash implementations of integral types, including at least some implementations of std::hash are basically a static cast. e.g. std::hash<int64_t>{}(100) == 100. This is fast, but leads to poor distributions in our bucketing, where we mod the hash with the number of buckets. In particular, if performing a grouping hash on a timeseries where the time points are dates results in all of the rows being partitioned into bucket zero, which then results in no parallelism in the aggregation clause.

Swap to using a consistent hash function across all supported platforms with improved uniformity.

Performance 18254756429: Improve hash grouping aggregation parallelism

e018f74

alexowens90 self-assigned this Oct 23, 2025

alexowens90 requested review from IvoDD and poodlewars as code owners October 23, 2025 15:35

alexowens90 added patch Small change, should increase patch version performance labels Oct 23, 2025

IvoDD approved these changes Oct 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance 18254756429: Improve hash grouping aggregation parallelism #2729

Performance 18254756429: Improve hash grouping aggregation parallelism #2729

alexowens90 commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Performance 18254756429: Improve hash grouping aggregation parallelism #2729

Are you sure you want to change the base?

Performance 18254756429: Improve hash grouping aggregation parallelism #2729

Conversation

alexowens90 commented Oct 23, 2025

Reference Issues/PRs

What does this implement or fix?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants