feat(compaction): support compaction for key table in framework#195
feat(compaction): support compaction for key table in framework#195lucasfang wants to merge 11 commits intoalibaba:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds merge-tree compaction support for key-table writes in the C++ framework by wiring a real CompactManager into MergeTreeWriter, introducing new compaction/lookup-related options, and adding unit/integration tests around the new behavior.
Changes:
- Implement merge-tree compaction orchestration (
MergeTreeCompactManager+ tasks) and integrate it intoKeyValueFileStoreWrite/MergeTreeWriter. - Extend compaction configuration via new
CoreOptionsfields (sorted-run triggers, levels, lookup compact mode/interval, force up-level-0). - Add/adjust tests (unit + integration) and update interfaces to propagate
Status/Resultfor compaction-related operations.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/inte/key_value_compaction_inte_test.cpp | New integration test exercising key-table compaction under injected IO failures. |
| test/inte/CMakeLists.txt | Registers the new integration test target. |
| src/paimon/core/options/lookup_strategy_test.cpp | Unit tests for LookupStrategy::From behavior and combinations. |
| src/paimon/core/options/lookup_strategy.h | Introduces LookupStrategy helper for deriving need_lookup. |
| src/paimon/core/options/lookup_compact_mode.h | Adds LookupCompactMode enum used by lookup compaction strategy selection. |
| src/paimon/core/operation/key_value_file_store_write.h | Extends writer to accept a comparator suitable for compaction and exposes compaction factory helpers. |
| src/paimon/core/operation/key_value_file_store_write.cpp | Creates Levels, compaction strategy, manager, and rewriter; wires compaction manager into MergeTreeWriter. |
| src/paimon/core/operation/file_store_write.cpp | Builds separate comparators (use_view=true/false) and DV maintainer factory; passes into key-value write path. |
| src/paimon/core/operation/abstract_file_store_write.cpp | Propagates Result from CompactDeletionFile::GetOrCompute() into prepare-commit flow. |
| src/paimon/core/mergetree/merge_tree_writer_test.cpp | Updates tests for new MergeTreeWriter ctor requiring a CompactManager (uses NoopCompactManager). |
| src/paimon/core/mergetree/merge_tree_writer.h | Adds Compact/Sync/CompactNotCompleted implementations and compaction state tracking members. |
| src/paimon/core/mergetree/merge_tree_writer.cpp | Implements compaction wiring: sync results, trigger compaction, track compact before/after, handle deletion files. |
| src/paimon/core/mergetree/lookup_levels_test.cpp | Updates test construction to pass NoopCompactManager into MergeTreeWriter. |
| src/paimon/core/mergetree/compact/merge_tree_compact_task.h | New compaction task for merge-tree compaction (rewrite/upgrade decisions + optional DV deletion file). |
| src/paimon/core/mergetree/compact/merge_tree_compact_manager_test.cpp | New unit tests validating MergeTreeCompactManager level/output behavior and lookup compaction “not completed” semantics. |
| src/paimon/core/mergetree/compact/merge_tree_compact_manager.h | New MergeTreeCompactManager interface/implementation declaration. |
| src/paimon/core/mergetree/compact/merge_tree_compact_manager.cpp | Implements compaction triggering, task submission, applying results to Levels, and metrics reporting. |
| src/paimon/core/mergetree/compact/interval_partition.h | Removes redundant forward declarations (relies on includes). |
| src/paimon/core/mergetree/compact/file_rewrite_compact_task.h | New task for file-rewrite style compaction units. |
| src/paimon/core/core_options_test.cpp | Adds coverage for new compaction/lookup options parsing/defaults and lookup strategy derivation. |
| src/paimon/core/core_options.h | Adds new getters for compaction knobs and lookup strategy/mode/levels. |
| src/paimon/core/core_options.cpp | Parses/stores new options, computes derived defaults, and centralizes NeedLookup() via LookupStrategy. |
| src/paimon/core/compact/noop_compact_manager.h | Updates AddNewFile signature to return Status. |
| src/paimon/core/compact/compact_manager.h | Changes AddNewFile to return Status for error propagation. |
| src/paimon/core/compact/compact_deletion_file_test.cpp | Adapts tests to Result<optional<...>> API and adds tests for lazy generation behavior. |
| src/paimon/core/compact/compact_deletion_file.h | Adds lazy deletion-file generation and changes GetOrCompute() to return Result<optional<...>>. |
| src/paimon/core/append/bucketed_append_compact_manager.h | Updates AddNewFile signature to return Status. |
| src/paimon/core/append/append_only_writer.cpp | Propagates Status from CompactManager::AddNewFile. |
| src/paimon/common/defs.cpp | Adds option key constants for new compaction/lookup configuration options. |
| src/paimon/CMakeLists.txt | Adds new sources and unit tests to the build. |
| include/paimon/defs.h | Documents new compaction/lookup configuration keys. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
61ec1d8 to
fad0bb6
Compare
e091189 to
31cb7f9
Compare
7ffe066 to
5a90235
Compare
| auto result = std::make_shared<CompactResult>(); | ||
| for (const auto& file : files_) { | ||
| PAIMON_RETURN_NOT_OK(RewriteFile(file, result.get())); | ||
| } |
There was a problem hiding this comment.
Will RewriteFile be called by other functions in the future?
I'm trying to understand the purpose of the to_update parameter — currently, it seems like we could just return rewriter_->Rewrite() directly without merging.
| std::make_shared<ReducerMergeFunctionWrapper>(std::move(mfunc)); | ||
|
|
||
| auto cancellation_controller = std::make_shared<CancellationController>(); | ||
| auto writer = std::make_shared<MergeTreeWriter>( |
There was a problem hiding this comment.
Seems no need for line 94 in NewFiles?
| const std::shared_ptr<BucketedDvMaintainer>& dv_maintainer, | ||
| bool lazy_gen_deletion_file, bool need_lookup, | ||
| bool force_rewrite_all_files, bool force_keep_delete, | ||
| const std::shared_ptr<CancellationController>& cancellation_controller); |
There was a problem hiding this comment.
Could executor be placed at the end of the parameter list, similar to memory_pool?
There was a problem hiding this comment.
Could we consolidate these parameters into CoreOptions and destructure them in the MergeTreeCompactManager constructor?
Alternatively, should MergeTreeCompactManager just hold a reference to CoreOptions directly?
The current parameter list is getting quite long. TT
| namespace paimon { | ||
|
|
||
| std::string RunsToString(const std::vector<LevelSortedRun>& runs) { | ||
| std::stringstream ss; |
There was a problem hiding this comment.
See VectorToString in StringUtils.
| } | ||
|
|
||
| Status MergeTreeCompactTask::Rewrite(std::vector<std::vector<SortedRun>>& candidate, | ||
| CompactResult* to_update) { |
There was a problem hiding this comment.
Please use std::vector<std::vector>* for out param.
|
|
||
| private: | ||
| Status DoClose() { | ||
| // cancel compaction so that it does not block job cancelling |
There was a problem hiding this comment.
Consider put DoClose to .cpp?
There was a problem hiding this comment.
Now where call DoClose as ~MergeTreeWriter() is deleted
| (void)key_comparator; | ||
| (void)user_defined_seq_comparator; | ||
| (void)levels; | ||
| LookupStrategy lookup_strategy = options_.GetLookupStrategy(); |
There was a problem hiding this comment.
Is there a compile error for (void)key_comparator;?
| const bool deletion_vector; | ||
|
|
||
| private: | ||
| LookupStrategy(bool is_first_row, bool produce_changelog, bool deletion_vector, |
There was a problem hiding this comment.
For readability, please distinguish the names of function parameters from the names of class members(e.g. please change is_first_row to _is_first_row)
Purpose
TODO: support lookup mode framework.
Tests
UT
IT
API and Format
Add options for pk compaction.
compaction.max-size-amplification-percent
compaction.size-ratio
num-sorted-run.compaction-trigger
num-sorted-run.stop-trigger
num-levels
lookup-compact
compaction.force-up-level-0
lookup-compact.max-interval
Generative AI tooling
Generated-by: GitHub Copilot (GPT-5.3-Codex)