feat(compaction): support compaction for append table#169
feat(compaction): support compaction for append table#169lucasfang merged 21 commits intoalibaba:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds append-table compaction support by introducing an async compaction manager for append-only writers, wiring compaction increments into commit flow, and adding restore helpers and integration/unit tests.
Changes:
- Add append-table compaction infrastructure (compact manager/task/result APIs + bucketed append compaction manager).
- Extend writer APIs to expose compaction triggering/progress and propagate compact-deletion-file through commit increments.
- Add/adjust restore utilities and tests (unit + integration) to validate compaction behavior.
Reviewed changes
Copilot reviewed 42 out of 42 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| test/inte/compaction_inte_test.cpp | New integration test that triggers append-table compaction and verifies commit increments. |
| test/inte/CMakeLists.txt | Registers the new integration test target. |
| src/paimon/core/utils/commit_increment.h | Extends commit increment to carry a compact-deletion-file handle. |
| src/paimon/core/utils/batch_writer.h | Extends BatchWriter interface with compaction and sync methods. |
| src/paimon/core/postpone/postpone_bucket_writer.h | Implements new BatchWriter virtuals as not-implemented stubs for postpone writer. |
| src/paimon/core/postpone/postpone_bucket_writer.cpp | Updates CommitIncrement construction for new signature. |
| src/paimon/core/operation/write_restore.h | Introduces WriteRestore abstraction for restoring files for writers. |
| src/paimon/core/operation/write_restore.cpp | Implements helper to extract data files/total buckets from manifest entries. |
| src/paimon/core/operation/restore_files.h | Adds RestoreFiles container for restored snapshot/files/index metadata. |
| src/paimon/core/operation/raw_file_split_read.h | Adds overload to create a reader directly from partition/bucket/files. |
| src/paimon/core/operation/raw_file_split_read.cpp | Refactors reader creation to reuse the new overload. |
| src/paimon/core/operation/key_value_file_store_write.cpp | Switches restore scanning to return RestoreFiles and uses its data files/total buckets. |
| src/paimon/core/operation/file_system_write_restore.h | Adds filesystem-based WriteRestore implementation for restoring from latest snapshot scan plan. |
| src/paimon/core/operation/file_store_commit_impl.cpp | Adds commit path for append-table compaction snapshots (CommitKind::Compact). |
| src/paimon/core/operation/append_only_file_store_write_test.cpp | Updates restore scanning expectations for the new RestoreFiles return type. |
| src/paimon/core/operation/append_only_file_store_write.h | Adds append-table compaction helpers and wiring for compaction rewrite/reading. |
| src/paimon/core/operation/append_only_file_store_write.cpp | Implements append compaction rewrite path and connects BucketedAppendCompactManager into append writer creation. |
| src/paimon/core/operation/abstract_split_read.h | Adds generalized deletion-file-map creation overload. |
| src/paimon/core/operation/abstract_split_read.cpp | Implements generalized deletion-file-map creation. |
| src/paimon/core/operation/abstract_file_store_write.h | Adds Compact(...) to FileStoreWrite impl and changes restore scan API to return RestoreFiles. |
| src/paimon/core/operation/abstract_file_store_write.cpp | Implements Compact(...), introduces a compaction executor, and refactors restore scanning via FileSystemWriteRestore. |
| src/paimon/core/mergetree/merge_tree_writer.h | Implements new BatchWriter virtuals as not-implemented stubs for merge-tree writer. |
| src/paimon/core/mergetree/merge_tree_writer.cpp | Updates CommitIncrement construction for new signature. |
| src/paimon/core/deletionvectors/deletion_file_writer.cpp | Adjusts external-path handling when building IndexFileMeta for deletion vectors. |
| src/paimon/core/deletionvectors/bucketed_dv_maintainer.h | Adds DV index maintainer for per-file deletion vector tracking and index rewriting. |
| src/paimon/core/compact/noop_compact_manager.h | Adds a no-op compaction manager implementation. |
| src/paimon/core/compact/compact_task.h | Introduces a compact task abstraction with an Execute() wrapper. |
| src/paimon/core/compact/compact_result.h | Adds missing include needed by compact result types. |
| src/paimon/core/compact/compact_manager.h | Introduces compaction manager interface used by writers. |
| src/paimon/core/compact/compact_future_manager.h | Provides a future-based async compaction manager helper. |
| src/paimon/core/compact/compact_deletion_file.h | Adds compact deletion-file generation/cleanup abstraction for DV index output. |
| src/paimon/core/append/bucketed_append_compact_manager.h | Introduces append-table bucketed compaction manager (async) implementing CompactManager. |
| src/paimon/core/append/bucketed_append_compact_manager.cpp | Implements append-table compaction scheduling and result handling. |
| src/paimon/core/append/bucketed_append_compact_manager_test.cpp | Adds unit tests for comparator/overlap helpers. |
| src/paimon/core/append/append_only_writer_test.cpp | Updates writer construction to pass a compaction manager. |
| src/paimon/core/append/append_only_writer.h | Wires compaction controls into append writer and tracks compaction before/after results. |
| src/paimon/core/append/append_only_writer.cpp | Implements flush/sync logic interacting with compaction manager and produces compaction increments. |
| src/paimon/common/reader/reader_utils.h | Adds helper to remove a field from a StructArray. |
| src/paimon/common/reader/reader_utils.cpp | Implements struct-field removal utility. |
| src/paimon/common/executor/executor.cpp | Adds ShutdownNow() support and improves shutdown behavior. |
| src/paimon/CMakeLists.txt | Adds new core sources and unit test source to build. |
| include/paimon/file_store_write.h | Extends public write API with Compact(partition, bucket, full_compaction). |
| include/paimon/executor.h | Extends public executor API with ShutdownNow(). |
Comments suppressed due to low confidence (1)
src/paimon/core/append/append_only_writer.h:80
AppendOnlyWriter::IsCompacting()always returns false, but this writer now triggers async compaction viacompact_manager_.AbstractFileStoreWrite::PrepareCommitrelies onIsCompacting()to decide when it is safe to close idle writers; returning false can cause a writer to be closed while compaction is still running or results are pending, potentially losing compaction outputs or leaving orphan files. Consider implementing this usingcompact_manager_->CompactNotCompleted()(and/or tracking pending compaction results).
bool IsCompacting() const override {
return false;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
e183f68 to
f866553
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 48 out of 48 changed files in this pull request and generated 15 comments.
Comments suppressed due to low confidence (1)
src/paimon/core/append/append_only_writer.h:80
AppendOnlyWriter::IsCompacting()still always returns false, but this writer now runs asynchronous compaction viacompact_manager_. Callers (e.g.,AbstractFileStoreWrite::PrepareCommit) useIsCompacting()to decide whether it's safe to close/evict an idle writer; always returning false can cause the writer to be closed while a compaction is still running and lose/leave behind compaction results. Please report the real compaction state (e.g., based oncompact_manager_->CompactNotCompleted()/ pending results).
Status Sync() override;
Status Close() override;
bool IsCompacting() const override {
return false;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 50 out of 50 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
8db93d5 to
e0fbc43
Compare
6ecabce to
66050b3
Compare
Purpose
Linked issue: #93
Support compaction for append table (without dv) by introducing an async compaction manager for append-only writers, wiring compaction increments into commit flow.
Tests
MetricsImplTest
DefaultExecutorTest
BucketedAppendCompactManagerTest
BucketedDvMaintainerTest
CompactionMetricsTest
WriteRestoreTest
ReaderUtilsTest
CompactionInteTest
API and Format
Add:
virtual Status FileStoreWrite::Compact(const std::map<std::string, std::string>& partition, int32_t bucket,
bool full_compaction) = 0;
virtual void Executor::ShutdownNow() = 0;
virtual void Metrics::SetGauge(const std::string& metric_name, double metric_value) = 0;
virtual Result Metrics::GetGauge(const std::string& metric_name) const = 0;
std::map<std::string, double> Metrics::GetAllGauges() const = 0;
Documentation