Poc: expose group hash feedback from GroupValues#23229
Conversation
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
f501156 to
03c7079
Compare
8e3137d to
0e09f71
Compare
1d3e71a to
3fc7753
Compare
13a67ae to
6a5284a
Compare
6a5284a to
1f6edfd
Compare
|
run benchmarks clickbench_partitioned |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing fuse-aggr-repart-poc-2 (d82f05b) to 32d3d3a (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
Huh what's HEAD doing? |
|
run benchmarks clickbench_partitioned |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing fuse-aggr-repart-poc-2 (bc70b9d) to 32d3d3a (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
Can see |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Which issue does this PR close?
Part of hash aggregate repartition POC.
Rationale for this change
This POC prepares
GroupValuesfor partition-aware partial aggregate output. The partial aggregate output path needs to know which input rows created new groups, and the hash for those rows, so it can later route newly-created group ids to target partitions without relying on a separate repartition + coalesce pipeline.What changes are included in this PR?
GroupValues::internto fill per-row hashes and optionally return the input rows that created new groups.GroupValuesimplementations.internsignature.Are these changes tested?
Yes:
cargo fmt --allcargo clippy --all-targets --all-features -- -D warningscargo test -p datafusion-physical-plan group_values --lib