Skip to content

Conversation

@youngsofun
Copy link
Member

@youngsofun youngsofun commented Dec 5, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

English Summary (Technical & Concise, Fit for PR Description)

Before this PR:

For string columns containing long strings, the memory size of a block slice incorrectly reflected the total size of the original buffer (rather than the actual data size of the slice). As a result, when splitting pages from the block with a maximum size limit, each subsequent page would progressively shrink in effective usable size

This PR:

Adds a new function memory_size_with_options to calculate the memory of block slices after garbage collection (GC), ensuring the memory size accurately represents only the data retained in the slice (instead of the full original buffer).

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-bugfix this PR patches a bug in codebase label Dec 5, 2025
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Column::Nullable(c) => c.column.memory_size() + c.validity.as_slice().0.len(),

P1 Badge Propagate gc flag through nullable columns

memory_size_with_options is meant to size string data after GC, and the HTTP page builder now calls it with gc=true, but the Column::Nullable arm still delegates to c.column.memory_size() and discards the gc flag. For nullable string columns sliced from a larger buffer, this keeps using the original buffer length, so the HTTP paginator will continue to underestimate how many rows fit in a page (the shrinking-page issue this change intended to fix).

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@youngsofun youngsofun marked this pull request as draft December 5, 2025 08:36
@youngsofun youngsofun marked this pull request as ready for review December 6, 2025 00:22
}

fn column_memory_size(col: &Self::Column) -> usize {
fn column_memory_size(col: &Self::Column, gc: bool) -> usize {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix this PR patches a bug in codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants