-
Notifications
You must be signed in to change notification settings - Fork 837
refactor: try reduce aggregate hash index cost on hot path #19072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Docker Image for PR
|
Docker Image for PR
|
|
Sequential probing has a greater likelihood of being optimized for SIMD instructions. Or maybe the compiler isn't that smart yet? |
I have saw the improvement from Q18 in tpch 1000 (120.03s -> 111.57s). I will perf for a flame graph to ensure that latter. Here is my guess: With pure linear probing (+1 each time), occupied slots tend to cluster. Once you hit such a cluster, the probing process need over a long run of consecutive occupied entries. If, instead, the next probe position is derived from the hash (i.e. more “random”), you break up these clusters. That may well hurt potential SIMD or prefetch optimisations, but it also shortens long probe chains on average. (BTW, this idea is inspired from an optimisation PR in DuckDB. A related approach from SwissTable is to keep linear probing but increase the step size when you encounter several consecutive occupied slots. I haven’t tested that variant here yet) |
|
Later, we can replace the current Entry with the Group of SwissTable |
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
refactor: try reduce aggregate hash index cost on hot path
When perf tpch-1000 q18, I found find_or_insert on hot path, which cannot be seen on smaller scale dataset. So this PR try to improve it, if the performance is better I will enrich this summary

Tests
Type of change
This change is