Skip to content

crl-release-26.1: pebble: show blob reference size in compaction events#6096

Merged
RaduBerinde merged 2 commits into
cockroachdb:crl-release-26.1from
RaduBerinde:tool-logs-parse-blob-reference-size-in-compaction-26.1
Jun 22, 2026
Merged

crl-release-26.1: pebble: show blob reference size in compaction events#6096
RaduBerinde merged 2 commits into
cockroachdb:crl-release-26.1from
RaduBerinde:tool-logs-parse-blob-reference-size-in-compaction-26.1

Conversation

@RaduBerinde

Copy link
Copy Markdown
Member

pebble: show blob reference size in compaction events

Previously, the per-level size in compaction event log lines reflected
only the sum of the input tables' sizes, e.g.:

L0 [31152773 31152772] (747KB) Score=1.31

This omitted the physical size of any blob references, so the reported
size understated the amount of data the tables actually represent when
value separation is in use.

This change sums the EstimatedPhysicalSize of each table's blob
references and, when non-zero, renders the level size as (x + y),
where x is the table size and y is the blob reference size, e.g.:

L0 [31152773 31152772] (747KB + 1.2MB) Score=1.31

A new TableInfo.EstimatedReferenceSize method is added (mirroring the
existing TableMetadata.EstimatedReferenceSize) since the underlying
blobReferences field is package-private to manifest. Levels without
blob references are unchanged.

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

tool/logs: parse blob reference size in compaction log lines

Compaction event log lines now render a level's size as (table + blob)
(e.g. (4.2MB + 1.0MB)) when the input tables reference blob files. The
log parser previously assumed a single size value inside the parentheses:

  • The compactionPattern regexp required the closing paren immediately
    after the first size, so the bytes capture (used for the output size
    of an compacted line) matched nothing for the new format.
  • sumInputBytes passed the entire parenthesized contents to
    unHumanize, which only understands a single size token.

This change teaches the parser about the optional + <size> suffix. The
regexp now accepts one or more +-separated sizes within the
parentheses, and a new unHumanizeSum helper splits on + and sums the
parts. Both the input-byte and output-byte parsing paths use it, so the
reported sizes include the estimated physical size of blob references.

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

RaduBerinde and others added 2 commits June 16, 2026 10:47
Previously, the per-level size in compaction event log lines reflected
only the sum of the input tables' sizes, e.g.:

    L0 [31152773 31152772] (747KB) Score=1.31

This omitted the physical size of any blob references, so the reported
size understated the amount of data the tables actually represent when
value separation is in use.

This change sums the `EstimatedPhysicalSize` of each table's blob
references and, when non-zero, renders the level size as `(x + y)`,
where `x` is the table size and `y` is the blob reference size, e.g.:

    L0 [31152773 31152772] (747KB + 1.2MB) Score=1.31

A new `TableInfo.EstimatedReferenceSize` method is added (mirroring the
existing `TableMetadata.EstimatedReferenceSize`) since the underlying
`blobReferences` field is package-private to `manifest`. Levels without
blob references are unchanged.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Compaction event log lines now render a level's size as `(table + blob)`
(e.g. `(4.2MB + 1.0MB)`) when the input tables reference blob files. The
log parser previously assumed a single size value inside the parentheses:

- The `compactionPattern` regexp required the closing paren immediately
  after the first size, so the `bytes` capture (used for the output size
  of an `compacted` line) matched nothing for the new format.
- `sumInputBytes` passed the entire parenthesized contents to
  `unHumanize`, which only understands a single size token.

This change teaches the parser about the optional `+ <size>` suffix. The
regexp now accepts one or more `+`-separated sizes within the
parentheses, and a new `unHumanizeSum` helper splits on `+` and sums the
parts. Both the input-byte and output-byte parsing paths use it, so the
reported sizes include the estimated physical size of blob references.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@RaduBerinde RaduBerinde requested a review from sumeerbhola June 16, 2026 17:54
@RaduBerinde RaduBerinde requested a review from a team as a code owner June 16, 2026 17:54
@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@sumeerbhola sumeerbhola left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@sumeerbhola made 1 comment.
Reviewable status: 0 of 5 files reviewed, all discussions resolved.

@RaduBerinde RaduBerinde merged commit 6526490 into cockroachdb:crl-release-26.1 Jun 22, 2026
7 checks passed
@RaduBerinde RaduBerinde deleted the tool-logs-parse-blob-reference-size-in-compaction-26.1 branch June 22, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants