Skip to content

feat(compaction): support compaction for key table in framework#195

Open
lucasfang wants to merge 11 commits intoalibaba:mainfrom
lucasfang:compaction_dev
Open

feat(compaction): support compaction for key table in framework#195
lucasfang wants to merge 11 commits intoalibaba:mainfrom
lucasfang:compaction_dev

Conversation

@lucasfang
Copy link
Collaborator

@lucasfang lucasfang commented Mar 23, 2026

Purpose

  • Add key-table compaction support in framework.
  • Follow up with stability and maintainability fixes, including full-compaction behavior, cancellation wiring, logging improvements, and related refactor/cleanup.

TODO: support lookup mode framework.

Tests

  • UT

    • MergeTreeCompactManagerTest
      • TestOutputToZeroLevel
      • TestCompactToPenultimateLayer
      • TestNoCompaction
      • TestNormal
      • TestUpgrade
      • TestSmallFiles
      • TestSmallFilesNoCompact
      • TestSmallFilesCrossLevel
      • TestComplex
      • TestSmallInComplex
      • TestIsCompacting
      • TestTriggerFullCompaction
      • TestRejectReentrantFullCompaction
  • IT

    • KeyValueCompactionInteTest
      • TestKeyValueTableCompactionWithIOException
      • TestKeyValueTableStreamWriteFullCompaction (parameterized by file format)
    • AppendCompactionInteTest
      • TestAppendTableStreamWriteFullCompaction (parameterized by file format)
      • TestAppendTableStreamWriteFullCompactionWithDv (parameterized by file format)
      • TestAppendTableStreamWriteBestEffortCompaction (parameterized by file format)
      • TestAppendTableStreamWriteCompactionWithExternalPath (parameterized by file format)
      • TestAppendTableCompactionWithIOException

API and Format

Add options for pk compaction.

compaction.max-size-amplification-percent
compaction.size-ratio
num-sorted-run.compaction-trigger
num-sorted-run.stop-trigger
num-levels
lookup-compact
compaction.force-up-level-0
lookup-compact.max-interval

Generative AI tooling

Generated-by: GitHub Copilot (GPT-5.3-Codex)

Copilot AI review requested due to automatic review settings March 23, 2026 01:39
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds merge-tree compaction support for key-table writes in the C++ framework by wiring a real CompactManager into MergeTreeWriter, introducing new compaction/lookup-related options, and adding unit/integration tests around the new behavior.

Changes:

  • Implement merge-tree compaction orchestration (MergeTreeCompactManager + tasks) and integrate it into KeyValueFileStoreWrite/MergeTreeWriter.
  • Extend compaction configuration via new CoreOptions fields (sorted-run triggers, levels, lookup compact mode/interval, force up-level-0).
  • Add/adjust tests (unit + integration) and update interfaces to propagate Status/Result for compaction-related operations.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test/inte/key_value_compaction_inte_test.cpp New integration test exercising key-table compaction under injected IO failures.
test/inte/CMakeLists.txt Registers the new integration test target.
src/paimon/core/options/lookup_strategy_test.cpp Unit tests for LookupStrategy::From behavior and combinations.
src/paimon/core/options/lookup_strategy.h Introduces LookupStrategy helper for deriving need_lookup.
src/paimon/core/options/lookup_compact_mode.h Adds LookupCompactMode enum used by lookup compaction strategy selection.
src/paimon/core/operation/key_value_file_store_write.h Extends writer to accept a comparator suitable for compaction and exposes compaction factory helpers.
src/paimon/core/operation/key_value_file_store_write.cpp Creates Levels, compaction strategy, manager, and rewriter; wires compaction manager into MergeTreeWriter.
src/paimon/core/operation/file_store_write.cpp Builds separate comparators (use_view=true/false) and DV maintainer factory; passes into key-value write path.
src/paimon/core/operation/abstract_file_store_write.cpp Propagates Result from CompactDeletionFile::GetOrCompute() into prepare-commit flow.
src/paimon/core/mergetree/merge_tree_writer_test.cpp Updates tests for new MergeTreeWriter ctor requiring a CompactManager (uses NoopCompactManager).
src/paimon/core/mergetree/merge_tree_writer.h Adds Compact/Sync/CompactNotCompleted implementations and compaction state tracking members.
src/paimon/core/mergetree/merge_tree_writer.cpp Implements compaction wiring: sync results, trigger compaction, track compact before/after, handle deletion files.
src/paimon/core/mergetree/lookup_levels_test.cpp Updates test construction to pass NoopCompactManager into MergeTreeWriter.
src/paimon/core/mergetree/compact/merge_tree_compact_task.h New compaction task for merge-tree compaction (rewrite/upgrade decisions + optional DV deletion file).
src/paimon/core/mergetree/compact/merge_tree_compact_manager_test.cpp New unit tests validating MergeTreeCompactManager level/output behavior and lookup compaction “not completed” semantics.
src/paimon/core/mergetree/compact/merge_tree_compact_manager.h New MergeTreeCompactManager interface/implementation declaration.
src/paimon/core/mergetree/compact/merge_tree_compact_manager.cpp Implements compaction triggering, task submission, applying results to Levels, and metrics reporting.
src/paimon/core/mergetree/compact/interval_partition.h Removes redundant forward declarations (relies on includes).
src/paimon/core/mergetree/compact/file_rewrite_compact_task.h New task for file-rewrite style compaction units.
src/paimon/core/core_options_test.cpp Adds coverage for new compaction/lookup options parsing/defaults and lookup strategy derivation.
src/paimon/core/core_options.h Adds new getters for compaction knobs and lookup strategy/mode/levels.
src/paimon/core/core_options.cpp Parses/stores new options, computes derived defaults, and centralizes NeedLookup() via LookupStrategy.
src/paimon/core/compact/noop_compact_manager.h Updates AddNewFile signature to return Status.
src/paimon/core/compact/compact_manager.h Changes AddNewFile to return Status for error propagation.
src/paimon/core/compact/compact_deletion_file_test.cpp Adapts tests to Result<optional<...>> API and adds tests for lazy generation behavior.
src/paimon/core/compact/compact_deletion_file.h Adds lazy deletion-file generation and changes GetOrCompute() to return Result<optional<...>>.
src/paimon/core/append/bucketed_append_compact_manager.h Updates AddNewFile signature to return Status.
src/paimon/core/append/append_only_writer.cpp Propagates Status from CompactManager::AddNewFile.
src/paimon/common/defs.cpp Adds option key constants for new compaction/lookup configuration options.
src/paimon/CMakeLists.txt Adds new sources and unit tests to the build.
include/paimon/defs.h Documents new compaction/lookup configuration keys.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

auto result = std::make_shared<CompactResult>();
for (const auto& file : files_) {
PAIMON_RETURN_NOT_OK(RewriteFile(file, result.get()));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will RewriteFile be called by other functions in the future?
I'm trying to understand the purpose of the to_update parameter — currently, it seems like we could just return rewriter_->Rewrite() directly without merging.

std::make_shared<ReducerMergeFunctionWrapper>(std::move(mfunc));

auto cancellation_controller = std::make_shared<CancellationController>();
auto writer = std::make_shared<MergeTreeWriter>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems no need for line 94 in NewFiles?

const std::shared_ptr<BucketedDvMaintainer>& dv_maintainer,
bool lazy_gen_deletion_file, bool need_lookup,
bool force_rewrite_all_files, bool force_keep_delete,
const std::shared_ptr<CancellationController>& cancellation_controller);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could executor be placed at the end of the parameter list, similar to memory_pool?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we consolidate these parameters into CoreOptions and destructure them in the MergeTreeCompactManager constructor?
Alternatively, should MergeTreeCompactManager just hold a reference to CoreOptions directly?

The current parameter list is getting quite long. TT

namespace paimon {

std::string RunsToString(const std::vector<LevelSortedRun>& runs) {
std::stringstream ss;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See VectorToString in StringUtils.

}

Status MergeTreeCompactTask::Rewrite(std::vector<std::vector<SortedRun>>& candidate,
CompactResult* to_update) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use std::vector<std::vector>* for out param.


private:
Status DoClose() {
// cancel compaction so that it does not block job cancelling
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider put DoClose to .cpp?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now where call DoClose as ~MergeTreeWriter() is deleted

(void)key_comparator;
(void)user_defined_seq_comparator;
(void)levels;
LookupStrategy lookup_strategy = options_.GetLookupStrategy();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a compile error for (void)key_comparator;?

const bool deletion_vector;

private:
LookupStrategy(bool is_first_row, bool produce_changelog, bool deletion_vector,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability, please distinguish the names of function parameters from the names of class members(e.g. please change is_first_row to _is_first_row)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants