Skip to content

perf: window operator performance degradation #17436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sundy-li opened this issue Feb 11, 2025 · 2 comments
Closed

perf: window operator performance degradation #17436

sundy-li opened this issue Feb 11, 2025 · 2 comments
Assignees
Labels
C-performance Category: Performance

Comments

@sundy-li
Copy link
Member

Summary

Image

------- query internal/09.sql -------
select max(number) over (partition by number % 3 order by number), rank() over (partition by number % 3 order by number) from numbers(20000000) where number % 100 != 0 ignore_result;
@sundy-li sundy-li added the C-performance Category: Performance label Feb 11, 2025
@forsaken628
Copy link
Collaborator

forsaken628 commented Feb 20, 2025

I compared the v636 and v700 implementations, and the main difference is that #16448 introduces ShufflePartition in WindowPartition.
The v636 implementation is very sensitive to whether the original data is sorted or not. For sorted data, the entire sorting process can be skipped. But on the other hand, if the original data is not sorted, it will consume more cpu time on sorting.

See the following benchmark

create warehouse 'v636' warehouse_size='xsmall' with version='v1.2.636-nightly';
create warehouse 'v700' warehouse_size='xsmall' with version='v1.2.700-nightly';

create table rand (number UINT64 not null) ENGINE=RANDOM;
create table t1 (number UINT64 not null);
insert into t1 select number % 20000000 from rand limit 20000000;

explain analyze select max(number) over (partition by number % 3 order by number), rank() over (partition by number % 3 order by number) from t1 where number % 100 != 0 ignore_result;
-- 636 total 5s   WindowPartition cpu time: 9.995544112s
-- 700 total 6.1s WindowPartition cpu time: 5.09495319s

explain analyze select max(number) over (partition by number % 3 order by number), rank() over (partition by number % 3 order by number) from numbers(20000000) where number % 100 != 0 ignore_result;
-- 636 total 1.4s WindowPartition cpu time: 987.903091ms
-- 700 total 3.9s WindowPartition cpu time: 2.752338919s

I don't think the internal/09.sql use case is representative of overall performance.

cc @Dousir9

@sundy-li
Copy link
Member Author

explain analyze select max(number) over (partition by number % 3 order by number), rank() over (partition by number % 3 order by number) from t1 where number % 100 != 0 ignore_result;
-- 636 total 5s WindowPartition cpu time: 9.995544112s
-- 700 total 6.1s WindowPartition cpu time: 5.09495319s

Ok, seems internal/09.sql use case is not suitable for benchmark validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Category: Performance
Projects
None yet
Development

No branches or pull requests

2 participants