Description
There is a fatal crash in the core engine when running commits consolidation multiple times on an array that contains vacuumed Delete Query Conditions. Additionally, a secondary parsing bug prevents superseded consolidated commit (.con) files from being properly vacuumed.
Steps to Reproduce
The crash occurs when executing the following sequence on a sparse array:
- Write initial data (creates a
.wrt file).
- Submit a Delete Query Condition (creates a
.del file).
- Run Commits Consolidation (embeds the
.wrt and .del into a .con file).
- Run Commits Vacuuming (deletes the original physical
.del file from disk).
- Run Commits Consolidation again.
Result: The engine crashes with: [TileDB::C++API] Error: Non-retrievable error occurred.
Root Cause Analysis
I have tracked this down to two distinct architectural flaws in the commits ledger subsystem:
- The Consolidator Crash: During the second consolidation pass,
Consolidator::write_consolidated_commits_file attempts to call vfs.file_size() and vfs.read() on the logical .del URI. Because the file was previously vacuumed, the VFS panics. The engine needs to be aware of the physical location of the payload (whether it is a raw file or already embedded in a .con file) and read from the correct byte offset.
- The Vacuum Verification Failure:
ArrayDirectory::load_consolidated_commit_uris validates .con files by parsing them line-by-line and matching the string URIs against physical directory contents. However, it does not advance the stream past the embedded binary .del payloads. The parser reads the binary data as characters, fails the string verification, marks the .con file as invalid, and abandons vacuuming it.
Proposed Solution
I have a patch ready that:
- Maps the physical location and byte offsets of
.del payloads in ArrayDirectory.
- Introduces a
skip_delete_payload stream helper to ensure .con files verify cleanly and get vacuumed.
- Refactors the consolidator to read superseded payloads dynamically from older
.con files.
I will open a PR linking to this issue shortly!
Description
There is a fatal crash in the core engine when running commits consolidation multiple times on an array that contains vacuumed Delete Query Conditions. Additionally, a secondary parsing bug prevents superseded consolidated commit (
.con) files from being properly vacuumed.Steps to Reproduce
The crash occurs when executing the following sequence on a sparse array:
.wrtfile)..delfile)..wrtand.delinto a.confile)..delfile from disk).Result: The engine crashes with:
[TileDB::C++API] Error: Non-retrievable error occurred.Root Cause Analysis
I have tracked this down to two distinct architectural flaws in the commits ledger subsystem:
Consolidator::write_consolidated_commits_fileattempts to callvfs.file_size()andvfs.read()on the logical.delURI. Because the file was previously vacuumed, the VFS panics. The engine needs to be aware of the physical location of the payload (whether it is a raw file or already embedded in a.confile) and read from the correct byte offset.ArrayDirectory::load_consolidated_commit_urisvalidates.confiles by parsing them line-by-line and matching the string URIs against physical directory contents. However, it does not advance the stream past the embedded binary.delpayloads. The parser reads the binary data as characters, fails the string verification, marks the.confile as invalid, and abandons vacuuming it.Proposed Solution
I have a patch ready that:
.delpayloads inArrayDirectory.skip_delete_payloadstream helper to ensure.confiles verify cleanly and get vacuumed..confiles.I will open a PR linking to this issue shortly!