-
Notifications
You must be signed in to change notification settings - Fork 837
fix: http handler cut block to pages using memory size after gc. #19071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
databend/src/query/expression/src/values.rs
Line 1788 in 53e0147
| Column::Nullable(c) => c.column.memory_size() + c.validity.as_slice().0.len(), |
memory_size_with_options is meant to size string data after GC, and the HTTP page builder now calls it with gc=true, but the Column::Nullable arm still delegates to c.column.memory_size() and discards the gc flag. For nullable string columns sliced from a larger buffer, this keeps using the original buffer length, so the HTTP paginator will continue to underestimate how many rows fit in a page (the shrinking-page issue this change intended to fix).
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
|
|
||
| fn column_memory_size(col: &Self::Column) -> usize { | ||
| fn column_memory_size(col: &Self::Column, gc: bool) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking of adding a new function?
https://docs.rs/arrow/latest/arrow/array/struct.ArrayData.html#method.get_buffer_memory_size
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
English Summary (Technical & Concise, Fit for PR Description)
Before this PR:
For string columns containing long strings, the memory size of a block slice incorrectly reflected the total size of the original buffer (rather than the actual data size of the slice). As a result, when splitting pages from the block with a maximum size limit, each subsequent page would progressively shrink in effective usable size
This PR:
Adds a new function
memory_size_with_optionsto calculate the memory of block slices after garbage collection (GC), ensuring the memory size accurately represents only the data retained in the slice (instead of the full original buffer).Tests
Type of change
This change is