Skip to content

Conversation

@zhyass
Copy link
Member

@zhyass zhyass commented Nov 28, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

This PR introduces Git-like branching and tagging for Databend's Fuse tables, inspired by Apache Iceberg. Users can create independent branches for development/testing and read-only tags for marking important snapshots.

1. Core Concepts

Branch

  • Writable: Supports INSERT, UPDATE, DELETE, and other write operations(not implemented in this PR)
  • Independent Evolution: Each branch maintains its own snapshot chain

Tag

  • Read-only: Does not support write operations, only used to mark important snapshots

2. SQL Syntax

2.1 Creating Branches or Tags

ALTER TABLE <database>.<table> CREATE BRANCH | TAG <name> 
[AT (
    SNAPSHOT => '<snapshot_id>' |
    TIMESTAMP => <timestamp> |
    STREAM => <stream_name> |
    OFFSET => <time_interval> |
    BRANCH => <branch_name> |
    TAG => <tag_name>
)]
[RETAIN <n> DAYS | SECONDS];

Parameters:

  • BRANCH | TAG: Specify whether to create a branch or tag
  • AT: Specify the point in time to base the creation on (optional, defaults to current snapshot)
    • SNAPSHOT: Based on a specific snapshot ID
    • TIMESTAMP: Based on a specific timestamp
    • STREAM: Based on the current position of a Stream
    • OFFSET: Based on a relative time offset
    • BRANCH: Based on the current state of another branch
    • TAG: Based on a tag
  • RETAIN: Set the retention period for the branch|tag (optional, defaults to none)

Examples:

-- Create a development branch based on current state
ALTER TABLE sales.orders CREATE BRANCH dev;

-- Create a test branch based on yesterday's data, retain for 7 days
ALTER TABLE sales.orders CREATE BRANCH test 
AT (TIMESTAMP => '2024-11-27 00:00:00') 
RETAIN 7 DAYS;

-- Create a tag for current state
ALTER TABLE sales.orders CREATE TAG v1;

-- Create a tag based on a specific snapshot
ALTER TABLE sales.orders CREATE TAG backup_before_migration
AT (SNAPSHOT => '9828b23f74664ff3806f44bbc1925ea5');

2.2 Dropping Branches or Tags

ALTER TABLE <database>.<table> DROP BRANCH | TAG <name>;

Note: Drop operations are irreversible. Use with caution.

Examples:

-- Drop development branch
ALTER TABLE sales.orders DROP BRANCH dev;

-- Drop tag
ALTER TABLE sales.orders DROP TAG v1;

2.3 Querying Branch Data

-- Query data from a specific branch (similar to Git's remote/branch syntax)
SELECT * FROM <database>.<table>/<branch_name>;

-- Query development branch
SELECT * FROM sales.orders/dev;

3. Data Structure Design

pub struct TableMeta {
    // ... other fields ...
    
    /// Stores all branch and tag references
    pub refs: BTreeMap<String, SnapshotRef>,
}

pub struct SnapshotRef {
    /// The unique id of the reference.
    pub id: u64,
    /// After this timestamp, the reference becomes inactive.
    pub expire_at: Option<DateTime<Utc>>,
    /// The type of the reference.
    pub typ: SnapshotRefType,
    /// The location of the snapshot that this reference points to.
    pub loc: String,
}

pub enum SnapshotRefType {
    Branch = 0,
    Tag = 1,
}

4. Storage Layout

<table_prefix>/
├── _ss/                          # Main branch snapshot directory
│   ├── <snapshot_file>
│   └── ...
├── _refs/                        # Branch and tag directory
│   ├── <id1>/                      # dev branch
│   │   ├── <snapshot_file>
│   │   └── ...
│   ├── <id2>/                     # test branch
│   │   └── ...
│   └── <id3>/                     # v1 tag
│       └── <snapshot_file>
├── _sg/                          # Segment files (shared)
└── _b/                           # Block files (shared)
  • Main branch snapshots stored in _ss/ directory
  • Branch and tag snapshots stored in _refs/<ref_id>/ directories
  • Segment and Block files are shared across all branches, saving storage space

5. Vacuum and GC Integration

NOTE: If a branch or tag has expired, it will be cleaned up during vacuum and purge.

5.1 Tag Processing

Tags are read-only with a single snapshot

  1. Read head snapshot: Read the tag's head snapshot
  2. Act as GC Root: Tag snapshot serves as one of the GC roots, protecting its segments and blocks from cleanup
  3. No cleanup needed: Tag itself doesn't need snapshot cleanup (read-only)

5.2 Branch Processing

Branches need snapshot cleanup

  1. Select GC Root:
  • RetentionPolicy applies ByTimePeriod first.
  • ByNumOfSnapshotsToKeep is only used when no snapshots are expired by time.
  • Rationale: For long-inactive branches, snapshot-count based retention may produce a very old GC root timestamp, which prevents effective cleanup.
  1. Collect snapshots_to_gc: Get the list of snapshots to be cleaned
  2. Protect segments/blocks: Even if gc_root cannot be obtained, use earliest snapshot as gc_root to protect data

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@zhyass zhyass requested a review from drmingdrmer as a code owner November 28, 2025 16:39
@zhyass zhyass marked this pull request as draft November 28, 2025 16:39
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Nov 28, 2025
@zhyass zhyass changed the title feat: table branching and tagging feat: initial support for table branching and tagging Nov 28, 2025
@zhyass zhyass marked this pull request as ready for review December 1, 2025 01:48
@zhyass zhyass requested review from drmingdrmer and removed request for drmingdrmer December 2, 2025 06:21
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drmingdrmer reviewed 35 of 143 files at r1.
Reviewable status: 35 of 143 files reviewed, all discussions resolved (waiting on @dantengsky and @SkyFan2002)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature this PR introduces a new feature to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants