Skip to content

Conversation

@alexowens90
Copy link
Collaborator

Reference Issues/PRs

18254756429

What does this implement or fix?

Poor quality hash implementations of integral types, including at least some implementations of std::hash are basically a static cast. e.g. std::hash<int64_t>{}(100) == 100. This is fast, but leads to poor distributions in our bucketing, where we mod the hash with the number of buckets. In particular, if performing a grouping hash on a timeseries where the time points are dates results in all of the rows being partitioned into bucket zero, which then results in no parallelism in the aggregation clause.

Swap to using a consistent hash function across all supported platforms with improved uniformity.

@alexowens90 alexowens90 self-assigned this Oct 23, 2025
@alexowens90 alexowens90 added patch Small change, should increase patch version performance labels Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch Small change, should increase patch version performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants