fix(writer): validate equality delete field ids#2723
Conversation
d0f2659 to
fd920af
Compare
viirya
left a comment
There was a problem hiding this comment.
Confirmed the gap: EqualityDeleteWriterConfig::new on main goes straight to schema_to_arrow_schema + RecordBatchProjector::new with no validation of equality_ids, so an empty / duplicated / non-existent id was silently accepted. Adding these checks is a real improvement.
The comment fix is accurate against the spec — equality-delete fields do follow the identifier-field type restrictions except that optional columns (and columns under optional structs) are allowed. I checked the spec wording and it matches what you wrote.
One thing worth flagging for the record: the duplicate-id and "id must exist in schema" checks go a bit beyond the Java reference — Java core guards against null/empty (GenericAppenderFactory, FileWriterBuilderImpl) but does not check duplicates or schema membership. I think adding them is the right call (a duplicate equality id is a clear user error, and an id absent from the schema would produce a meaningless delete file), just noting it's stricter than Java so it's a deliberate choice rather than an oversight.
Pulled the branch: all 11 equality_delete tests pass (including the 5 new ones), clippy and rustfmt clean. LGTM.
fd920af to
4729bc5
Compare
|
@viirya Thanks for the thorough and rigorous review — really appreciate you cross-checking the spec wording and the Java reference implementation. You're right that the duplicate-id and schema-membership checks are deliberate stricter guards. A duplicate equality id is almost certainly a user/configuration error, and an id missing from the schema would lead to a confusing failure later (or a meaningless delete file). Failing fast here with a clear error message feels like the right behavior for the Rust writer. I'll also open a corresponding PR against the Java SDK to add the same duplicate / missing-id validation there, so the two implementations stay aligned on this stricter behavior. I'll link it back to this PR once it's up. Thanks again for the LGTM! |
6d93c61 to
59d3afb
Compare
b63ef05 to
ad103e4
Compare
ad103e4 to
40fcaec
Compare
Which issue does this PR close?
What changes are included in this PR?
RecordBatchProjectorbehavior for unsupported field types and map/list reachability.Are these changes tested?
cargo test -p iceberg test_equality_delete_rejects_empty_equality_ids --libcargo test -p iceberg test_equality_delete_rejects_duplicate_equality_ids --libcargo test -p iceberg test_equality_delete_rejects_missing_equality_id --libcargo test -p iceberg test_equality_delete_unreachable_column --libcargo test -p iceberg test_equality_delete_with_nullable_field --libcargo test -p iceberg equality_delete --libcargo fmt --checkcargo test -p iceberg --libcargo clippy -p iceberg --all-features --lib --tests -- -D warningsmake checkmake unit-testdocker compose -f dev/docker-compose.yaml up -d --build --waitmake nextest(1832 passed, 0 skipped)