|
| 1 | +--- |
| 2 | +title: How Databend Copy-Free Data Sharing Works |
| 3 | +--- |
| 4 | + |
| 5 | +## What is Data Sharing? |
| 6 | + |
| 7 | +Different teams need different parts of the same data. Traditional solutions copy data multiple times - expensive and hard to maintain. |
| 8 | + |
| 9 | +Databend's **[ATTACH TABLE](/sql/sql-commands/ddl/table/attach-table)** solves this elegantly: create multiple "views" of the same data without copying it. This leverages Databend's **true compute-storage separation** - whether using cloud storage or on-premise object storage: **store once, access everywhere**. |
| 10 | + |
| 11 | +Think of ATTACH TABLE like computer shortcuts - they point to the original file without duplicating it. |
| 12 | + |
| 13 | +``` |
| 14 | + Object Storage (S3, MinIO, Azure, etc.) |
| 15 | + ┌─────────────┐ |
| 16 | + │ Your Data │ |
| 17 | + └──────┬──────┘ |
| 18 | + │ |
| 19 | + ┌───────────────────────┼───────────────────────┐ |
| 20 | + │ │ │ |
| 21 | + ▼ ▼ ▼ |
| 22 | +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ |
| 23 | +│ Marketing │ │ Finance │ │ Sales │ |
| 24 | +│ Team View │ │ Team View │ │ Team View │ |
| 25 | +└─────────────┘ └─────────────┘ └─────────────┘ |
| 26 | +``` |
| 27 | + |
| 28 | +## How to Use ATTACH TABLE |
| 29 | + |
| 30 | +**Step 1: Find your data location** |
| 31 | +```sql |
| 32 | +SELECT snapshot_location FROM FUSE_SNAPSHOT('default', 'company_sales'); |
| 33 | +-- Result: 1/23351/_ss/... → Data at s3://your-bucket/1/23351/ |
| 34 | +``` |
| 35 | + |
| 36 | +**Step 2: Create team-specific views** |
| 37 | +```sql |
| 38 | +-- Marketing: Customer behavior analysis |
| 39 | +ATTACH TABLE marketing_view (customer_id, product, amount, order_date) |
| 40 | +'s3://your-bucket/1/23351/' CONNECTION = (AWS_KEY_ID = 'xxx', AWS_SECRET_KEY = 'yyy'); |
| 41 | + |
| 42 | +-- Finance: Revenue tracking |
| 43 | +ATTACH TABLE finance_view (order_id, amount, profit, order_date) |
| 44 | +'s3://your-bucket/1/23351/' CONNECTION = (AWS_KEY_ID = 'xxx', AWS_SECRET_KEY = 'yyy'); |
| 45 | + |
| 46 | +-- HR: Employee info without salaries |
| 47 | +ATTACH TABLE hr_employees (employee_id, name, department) |
| 48 | +'s3://data/1/23351/' CONNECTION = (...); |
| 49 | + |
| 50 | +-- Development: Production structure without sensitive data |
| 51 | +ATTACH TABLE dev_customers (customer_id, country, created_date) |
| 52 | +'s3://data/1/23351/' CONNECTION = (...); |
| 53 | +``` |
| 54 | + |
| 55 | +**Step 3: Query independently** |
| 56 | +```sql |
| 57 | +-- Marketing analyzes trends |
| 58 | +SELECT product, COUNT(*) FROM marketing_view GROUP BY product; |
| 59 | + |
| 60 | +-- Finance tracks profit |
| 61 | +SELECT order_date, SUM(profit) FROM finance_view GROUP BY order_date; |
| 62 | +``` |
| 63 | + |
| 64 | +## Key Benefits |
| 65 | + |
| 66 | +**Real-Time Updates**: When source data changes, all attached tables see it instantly |
| 67 | +```sql |
| 68 | +INSERT INTO company_sales VALUES ( 1001, 501, 'Laptop', 1299. 99, 299. 99, '[email protected]', '2024-01-20'); |
| 69 | +SELECT COUNT(*) FROM marketing_view WHERE order_date = '2024-01-20'; -- Returns: 1 |
| 70 | +``` |
| 71 | + |
| 72 | +**Column-Level Security**: Teams only see what they need - Marketing can't see profit, Finance can't see customer emails |
| 73 | + |
| 74 | +**Strong Consistency**: Never read partial updates, always see complete snapshots - perfect for financial reporting and compliance |
| 75 | + |
| 76 | +**Full Performance**: All indexes work automatically, same speed as regular tables |
| 77 | + |
| 78 | +## Why This Matters |
| 79 | + |
| 80 | +| Traditional Approach | Databend ATTACH TABLE | |
| 81 | +|---------------------|----------------------| |
| 82 | +| Multiple data copies | Single copy shared by all | |
| 83 | +| ETL delays, sync issues | Real-time, always current | |
| 84 | +| Complex maintenance | Zero maintenance | |
| 85 | +| More copies = more security risk | Fine-grained column access | |
| 86 | +| Slower due to data movement | Full optimization on original data | |
| 87 | + |
| 88 | +## How It Works Under the Hood |
| 89 | + |
| 90 | +``` |
| 91 | +Query: SELECT product, SUM(amount) FROM marketing_view GROUP BY product |
| 92 | +
|
| 93 | +┌─────────────────────────────────────────────────────────────────┐ |
| 94 | +│ Query Execution Flow │ |
| 95 | +└─────────────────────────────────────────────────────────────────┘ |
| 96 | +
|
| 97 | + User Query |
| 98 | + │ |
| 99 | + ▼ |
| 100 | +┌───────────────────┐ ┌─────────────────────────────────────┐ |
| 101 | +│ 1. Read Snapshot │───►│ s3://bucket/1/23351/_ss/ │ |
| 102 | +│ Metadata │ │ Get current table state │ |
| 103 | +└───────────────────┘ └─────────────────────────────────────┘ |
| 104 | + │ |
| 105 | + ▼ |
| 106 | +┌───────────────────┐ ┌─────────────────────────────────────┐ |
| 107 | +│ 2. Apply Column │───►│ Filter: customer_id, product, │ |
| 108 | +│ Filter │ │ amount, order_date │ |
| 109 | +└───────────────────┘ └─────────────────────────────────────┘ |
| 110 | + │ |
| 111 | + ▼ |
| 112 | +┌───────────────────┐ ┌─────────────────────────────────────┐ |
| 113 | +│ 3. Check Stats & │───►│ • Segment min/max values │ |
| 114 | +│ Indexes │ │ • Bloom filters │ |
| 115 | +└───────────────────┘ │ • Aggregate indexes │ |
| 116 | + │ └─────────────────────────────────────┘ |
| 117 | + ▼ |
| 118 | +┌───────────────────┐ ┌─────────────────────────────────────┐ |
| 119 | +│ 4. Smart Data │───►│ Skip irrelevant blocks │ |
| 120 | +│ Fetching │ │ Download only needed data from _b/ │ |
| 121 | +└───────────────────┘ └─────────────────────────────────────┘ |
| 122 | + │ |
| 123 | + ▼ |
| 124 | +┌───────────────────┐ ┌─────────────────────────────────────┐ |
| 125 | +│ 5. Local │───►│ Full optimization & parallelism │ |
| 126 | +│ Execution │ │ Process with all available indexes │ |
| 127 | +└───────────────────┘ └─────────────────────────────────────┘ |
| 128 | + │ |
| 129 | + ▼ |
| 130 | + Results: Product sales summary |
| 131 | +``` |
| 132 | + |
| 133 | +Multiple Databend clusters can execute this flow simultaneously without coordination - true compute-storage separation in action. |
| 134 | + |
| 135 | +ATTACH TABLE represents a fundamental shift: **from copying data for each use case to one copy with many views**. Whether in cloud or on-premise environments, Databend's architecture enables powerful, efficient data sharing while maintaining enterprise-grade consistency and security. |
0 commit comments