Skip to content

feat(compaction): support compaction for append table#169

Merged
lucasfang merged 21 commits intoalibaba:mainfrom
lucasfang:compaction_dev
Mar 11, 2026
Merged

feat(compaction): support compaction for append table#169
lucasfang merged 21 commits intoalibaba:mainfrom
lucasfang:compaction_dev

Conversation

@lucasfang
Copy link
Collaborator

@lucasfang lucasfang commented Mar 6, 2026

Purpose

Linked issue: #93
Support compaction for append table (without dv) by introducing an async compaction manager for append-only writers, wiring compaction increments into commit flow.

Tests

MetricsImplTest
DefaultExecutorTest
BucketedAppendCompactManagerTest
BucketedDvMaintainerTest
CompactionMetricsTest
WriteRestoreTest
ReaderUtilsTest
CompactionInteTest

API and Format

Add:
virtual Status FileStoreWrite::Compact(const std::map<std::string, std::string>& partition, int32_t bucket,
bool full_compaction) = 0;
virtual void Executor::ShutdownNow() = 0;
virtual void Metrics::SetGauge(const std::string& metric_name, double metric_value) = 0;
virtual Result Metrics::GetGauge(const std::string& metric_name) const = 0;
std::map<std::string, double> Metrics::GetAllGauges() const = 0;

Documentation

Copilot AI review requested due to automatic review settings March 6, 2026 06:51
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds append-table compaction support by introducing an async compaction manager for append-only writers, wiring compaction increments into commit flow, and adding restore helpers and integration/unit tests.

Changes:

  • Add append-table compaction infrastructure (compact manager/task/result APIs + bucketed append compaction manager).
  • Extend writer APIs to expose compaction triggering/progress and propagate compact-deletion-file through commit increments.
  • Add/adjust restore utilities and tests (unit + integration) to validate compaction behavior.

Reviewed changes

Copilot reviewed 42 out of 42 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
test/inte/compaction_inte_test.cpp New integration test that triggers append-table compaction and verifies commit increments.
test/inte/CMakeLists.txt Registers the new integration test target.
src/paimon/core/utils/commit_increment.h Extends commit increment to carry a compact-deletion-file handle.
src/paimon/core/utils/batch_writer.h Extends BatchWriter interface with compaction and sync methods.
src/paimon/core/postpone/postpone_bucket_writer.h Implements new BatchWriter virtuals as not-implemented stubs for postpone writer.
src/paimon/core/postpone/postpone_bucket_writer.cpp Updates CommitIncrement construction for new signature.
src/paimon/core/operation/write_restore.h Introduces WriteRestore abstraction for restoring files for writers.
src/paimon/core/operation/write_restore.cpp Implements helper to extract data files/total buckets from manifest entries.
src/paimon/core/operation/restore_files.h Adds RestoreFiles container for restored snapshot/files/index metadata.
src/paimon/core/operation/raw_file_split_read.h Adds overload to create a reader directly from partition/bucket/files.
src/paimon/core/operation/raw_file_split_read.cpp Refactors reader creation to reuse the new overload.
src/paimon/core/operation/key_value_file_store_write.cpp Switches restore scanning to return RestoreFiles and uses its data files/total buckets.
src/paimon/core/operation/file_system_write_restore.h Adds filesystem-based WriteRestore implementation for restoring from latest snapshot scan plan.
src/paimon/core/operation/file_store_commit_impl.cpp Adds commit path for append-table compaction snapshots (CommitKind::Compact).
src/paimon/core/operation/append_only_file_store_write_test.cpp Updates restore scanning expectations for the new RestoreFiles return type.
src/paimon/core/operation/append_only_file_store_write.h Adds append-table compaction helpers and wiring for compaction rewrite/reading.
src/paimon/core/operation/append_only_file_store_write.cpp Implements append compaction rewrite path and connects BucketedAppendCompactManager into append writer creation.
src/paimon/core/operation/abstract_split_read.h Adds generalized deletion-file-map creation overload.
src/paimon/core/operation/abstract_split_read.cpp Implements generalized deletion-file-map creation.
src/paimon/core/operation/abstract_file_store_write.h Adds Compact(...) to FileStoreWrite impl and changes restore scan API to return RestoreFiles.
src/paimon/core/operation/abstract_file_store_write.cpp Implements Compact(...), introduces a compaction executor, and refactors restore scanning via FileSystemWriteRestore.
src/paimon/core/mergetree/merge_tree_writer.h Implements new BatchWriter virtuals as not-implemented stubs for merge-tree writer.
src/paimon/core/mergetree/merge_tree_writer.cpp Updates CommitIncrement construction for new signature.
src/paimon/core/deletionvectors/deletion_file_writer.cpp Adjusts external-path handling when building IndexFileMeta for deletion vectors.
src/paimon/core/deletionvectors/bucketed_dv_maintainer.h Adds DV index maintainer for per-file deletion vector tracking and index rewriting.
src/paimon/core/compact/noop_compact_manager.h Adds a no-op compaction manager implementation.
src/paimon/core/compact/compact_task.h Introduces a compact task abstraction with an Execute() wrapper.
src/paimon/core/compact/compact_result.h Adds missing include needed by compact result types.
src/paimon/core/compact/compact_manager.h Introduces compaction manager interface used by writers.
src/paimon/core/compact/compact_future_manager.h Provides a future-based async compaction manager helper.
src/paimon/core/compact/compact_deletion_file.h Adds compact deletion-file generation/cleanup abstraction for DV index output.
src/paimon/core/append/bucketed_append_compact_manager.h Introduces append-table bucketed compaction manager (async) implementing CompactManager.
src/paimon/core/append/bucketed_append_compact_manager.cpp Implements append-table compaction scheduling and result handling.
src/paimon/core/append/bucketed_append_compact_manager_test.cpp Adds unit tests for comparator/overlap helpers.
src/paimon/core/append/append_only_writer_test.cpp Updates writer construction to pass a compaction manager.
src/paimon/core/append/append_only_writer.h Wires compaction controls into append writer and tracks compaction before/after results.
src/paimon/core/append/append_only_writer.cpp Implements flush/sync logic interacting with compaction manager and produces compaction increments.
src/paimon/common/reader/reader_utils.h Adds helper to remove a field from a StructArray.
src/paimon/common/reader/reader_utils.cpp Implements struct-field removal utility.
src/paimon/common/executor/executor.cpp Adds ShutdownNow() support and improves shutdown behavior.
src/paimon/CMakeLists.txt Adds new core sources and unit test source to build.
include/paimon/file_store_write.h Extends public write API with Compact(partition, bucket, full_compaction).
include/paimon/executor.h Extends public executor API with ShutdownNow().
Comments suppressed due to low confidence (1)

src/paimon/core/append/append_only_writer.h:80

  • AppendOnlyWriter::IsCompacting() always returns false, but this writer now triggers async compaction via compact_manager_. AbstractFileStoreWrite::PrepareCommit relies on IsCompacting() to decide when it is safe to close idle writers; returning false can cause a writer to be closed while compaction is still running or results are pending, potentially losing compaction outputs or leaving orphan files. Consider implementing this using compact_manager_->CompactNotCompleted() (and/or tracking pending compaction results).
    bool IsCompacting() const override {
        return false;
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 48 out of 48 changed files in this pull request and generated 15 comments.

Comments suppressed due to low confidence (1)

src/paimon/core/append/append_only_writer.h:80

  • AppendOnlyWriter::IsCompacting() still always returns false, but this writer now runs asynchronous compaction via compact_manager_. Callers (e.g., AbstractFileStoreWrite::PrepareCommit) use IsCompacting() to decide whether it's safe to close/evict an idle writer; always returning false can cause the writer to be closed while a compaction is still running and lose/leave behind compaction results. Please report the real compaction state (e.g., based on compact_manager_->CompactNotCompleted() / pending results).
    Status Sync() override;
    Status Close() override;
    bool IsCompacting() const override {
        return false;
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 50 out of 50 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Collaborator

@lxy-9602 lxy-9602 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lucasfang lucasfang merged commit 9b24805 into alibaba:main Mar 11, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants