Skip to content

fix(partition): reject invalid transform parameters#2474

Open
fallintoplace wants to merge 2 commits into
apache:mainfrom
fallintoplace:fix/strict-transform-parser
Open

fix(partition): reject invalid transform parameters#2474
fallintoplace wants to merge 2 commits into
apache:mainfrom
fallintoplace:fix/strict-transform-parser

Conversation

@fallintoplace

Copy link
Copy Markdown
Contributor

Summary

  • require exact bucket[N] and truncate[W] syntax when parsing transforms
  • reject zero values such as bucket[0] and truncate[0]
  • add parser coverage for valid, zero, and malformed parameterized transforms

Why

The previous parser stripped the transform prefix and trimmed brackets loosely, so malformed strings like bucket10 and truncate10 were accepted. It also accepted zero parameters, which can later panic when the transform is evaluated.

Fixes #2473.

Tests

  • cargo fmt --check
  • cargo test -p iceberg parameterized_transform
  • cargo test -p iceberg spec::transform
  • cargo test -p iceberg bucket
  • cargo test -p iceberg truncate

@fallintoplace fallintoplace force-pushed the fix/strict-transform-parser branch from 758034a to 2899378 Compare May 23, 2026 17:33

@viirya viirya left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good defensive fix. Confirmed the panic risk this prevents — bucket evaluation does % (mod_n as i32) and truncate does rem_euclid(width), both of which panic on zero, and the old loose parser accepted garbage like bucket10 outright. The exact-syntax + non-zero validation closes both. The malformed-input test coverage is thorough (I traced bucket[10]extra and bucket[[10]], both correctly rejected).

Tiny nit, no change needed: since is_ascii_digit() already rejects +/-, the .with_source(err) branch on parse::<u32> is now only reachable on overflow (e.g. bucket[99999999999]) — harmless, and worth keeping for exactly that case.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions Bot added the stale label Jul 1, 2026
@fallintoplace fallintoplace force-pushed the fix/strict-transform-parser branch from 2899378 to 4dd63af Compare July 1, 2026 10:21
@fallintoplace

Copy link
Copy Markdown
Contributor Author

@viirya Is this ticket still in priority? I am available for any requested changes.

@viirya

viirya commented Jul 1, 2026

Copy link
Copy Markdown
Member

cc @kevinjqliu

@viirya viirya left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after the parser rewrite — this is a solid improvement over the version I approved earlier, worth calling out what it now catches that the old parser didn't.

The old inline parsing used trim_start_matches('[') / trim_end_matches(']') and a bare .parse(). I verified two holes that closes:

  • trim_*_matches strips repeated delimiters, so bucket[[10]] became 10]][[10 and could slip through. parse_parameterized_transform now uses strip_prefix('[') + strip_suffix(']') (exactly one bracket each) plus an is_ascii_digit check, so [[10]], bucket10], bucket[10 etc. are all rejected.
  • u32::parse accepts a leading + — I confirmed "+8".parse::<u32>() == Ok(8) — so bucket[+1] was silently accepted before. The digit-only check now rejects it (this is the "reject leading plus" commit).

The zero-rejection (the panic guard from the original PR — bucket does % n and truncate does rem_euclid(n), both panic on 0) is preserved. The malformed-input test table is a nice enumeration (bucket[], bucket[+1], bucket[[10]], bucket[10]extra, and the truncate equivalents).

Pulled the branch: all spec::transform tests pass (incl. the 3 new ones), clippy + rustfmt clean. Still LGTM — and it also un-stales the PR.

@github-actions github-actions Bot removed the stale label Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reject invalid bucket and truncate transform parameters

2 participants