fix(partition): reject invalid transform parameters#2474
Conversation
758034a to
2899378
Compare
viirya
left a comment
There was a problem hiding this comment.
Good defensive fix. Confirmed the panic risk this prevents — bucket evaluation does % (mod_n as i32) and truncate does rem_euclid(width), both of which panic on zero, and the old loose parser accepted garbage like bucket10 outright. The exact-syntax + non-zero validation closes both. The malformed-input test coverage is thorough (I traced bucket[10]extra and bucket[[10]], both correctly rejected).
Tiny nit, no change needed: since is_ascii_digit() already rejects +/-, the .with_source(err) branch on parse::<u32> is now only reachable on overflow (e.g. bucket[99999999999]) — harmless, and worth keeping for exactly that case.
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
2899378 to
4dd63af
Compare
|
@viirya Is this ticket still in priority? I am available for any requested changes. |
|
cc @kevinjqliu |
viirya
left a comment
There was a problem hiding this comment.
Re-reviewed after the parser rewrite — this is a solid improvement over the version I approved earlier, worth calling out what it now catches that the old parser didn't.
The old inline parsing used trim_start_matches('[') / trim_end_matches(']') and a bare .parse(). I verified two holes that closes:
trim_*_matchesstrips repeated delimiters, sobucket[[10]]became10]]→[[10and could slip through.parse_parameterized_transformnow usesstrip_prefix('[')+strip_suffix(']')(exactly one bracket each) plus anis_ascii_digitcheck, so[[10]],bucket10],bucket[10etc. are all rejected.u32::parseaccepts a leading+— I confirmed"+8".parse::<u32>() == Ok(8)— sobucket[+1]was silently accepted before. The digit-only check now rejects it (this is the "reject leading plus" commit).
The zero-rejection (the panic guard from the original PR — bucket does % n and truncate does rem_euclid(n), both panic on 0) is preserved. The malformed-input test table is a nice enumeration (bucket[], bucket[+1], bucket[[10]], bucket[10]extra, and the truncate equivalents).
Pulled the branch: all spec::transform tests pass (incl. the 3 new ones), clippy + rustfmt clean. Still LGTM — and it also un-stales the PR.
Summary
bucket[N]andtruncate[W]syntax when parsing transformsbucket[0]andtruncate[0]Why
The previous parser stripped the transform prefix and trimmed brackets loosely, so malformed strings like
bucket10andtruncate10were accepted. It also accepted zero parameters, which can later panic when the transform is evaluated.Fixes #2473.
Tests
cargo fmt --checkcargo test -p iceberg parameterized_transformcargo test -p iceberg spec::transformcargo test -p iceberg bucketcargo test -p iceberg truncate