Skip to content

parse: Add set_verify_checksums API; improve fuzz coverage#11

Closed
cgwalters wants to merge 1 commit intomainfrom
agent-improve-fuzz-checksum-coverage
Closed

parse: Add set_verify_checksums API; improve fuzz coverage#11
cgwalters wants to merge 1 commit intomainfrom
agent-improve-fuzz-checksum-coverage

Conversation

@cgwalters
Copy link
Collaborator

Summary

The parse.rs and differential.rs fuzz targets were getting almost zero coverage of deeper parser logic (PAX extensions, GNU long name/link, sparse files, etc.) because random fuzz input almost never has valid tar header checksums. The parser's verify_checksum() call at the top of parse_header() rejects ~100% of random inputs immediately.

Changes

Library (src/parse.rs): Add Parser::set_verify_checksums(bool) API that controls whether header checksums are verified during parsing. Default is true (safe by default). This follows the same pattern as the existing set_allow_empty_path(bool) API.

parse.rs fuzzer: Use the new API to skip checksum verification ~90% of the time (determined by the first byte of input), letting the fuzzer exercise all parsing code paths beyond the checksum gate. The remaining 10% still tests checksum validation itself.

differential.rs fuzzer: Since both tar-core and tar-rs must see identical data with valid checksums, use a fixup_checksums() approach that rewrites checksum fields in-place before passing to both parsers. Also minor cleanup: extract compare_entries() helper, use idiomatic zip+enumerate.

The parse.rs and differential.rs fuzz targets were getting almost zero
coverage of deeper parser logic (PAX extensions, GNU long name/link,
sparse files, etc.) because random fuzz input almost never has valid
tar header checksums. The parser's verify_checksum() call at the top
of parse_header() rejects ~100% of random inputs immediately.

For the parse.rs fuzzer, add a Parser::set_verify_checksums(bool) API
that allows skipping checksum verification entirely. The fuzzer uses
this ~90% of the time (determined by the first byte of input), letting
the fuzzer exercise all the parsing code paths beyond the checksum gate.

For the differential fuzzer, since both tar-core and tar-rs must see
identical data with valid checksums, use a fixup_checksums() approach
that rewrites checksum fields in-place before passing to both parsers.

Assisted-by: OpenCode (Claude claude-opus-4-6)
@cgwalters cgwalters closed this Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant