Skip to content

Crash in Commits Consolidator when processing vacuumed Delete Query Conditions #5771

@cl-earthscope

Description

@cl-earthscope

Description

There is a fatal crash in the core engine when running commits consolidation multiple times on an array that contains vacuumed Delete Query Conditions. Additionally, a secondary parsing bug prevents superseded consolidated commit (.con) files from being properly vacuumed.

Steps to Reproduce

The crash occurs when executing the following sequence on a sparse array:

  1. Write initial data (creates a .wrt file).
  2. Submit a Delete Query Condition (creates a .del file).
  3. Run Commits Consolidation (embeds the .wrt and .del into a .con file).
  4. Run Commits Vacuuming (deletes the original physical .del file from disk).
  5. Run Commits Consolidation again.

Result: The engine crashes with: [TileDB::C++API] Error: Non-retrievable error occurred.

Root Cause Analysis

I have tracked this down to two distinct architectural flaws in the commits ledger subsystem:

  1. The Consolidator Crash: During the second consolidation pass, Consolidator::write_consolidated_commits_file attempts to call vfs.file_size() and vfs.read() on the logical .del URI. Because the file was previously vacuumed, the VFS panics. The engine needs to be aware of the physical location of the payload (whether it is a raw file or already embedded in a .con file) and read from the correct byte offset.
  2. The Vacuum Verification Failure: ArrayDirectory::load_consolidated_commit_uris validates .con files by parsing them line-by-line and matching the string URIs against physical directory contents. However, it does not advance the stream past the embedded binary .del payloads. The parser reads the binary data as characters, fails the string verification, marks the .con file as invalid, and abandons vacuuming it.

Proposed Solution

I have a patch ready that:

  • Maps the physical location and byte offsets of .del payloads in ArrayDirectory.
  • Introduces a skip_delete_payload stream helper to ensure .con files verify cleanly and get vacuumed.
  • Refactors the consolidator to read superseded payloads dynamically from older .con files.

I will open a PR linking to this issue shortly!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions