Skip to content

Commit b85ee5e

Browse files
authored
add how databend Copy-Free data sharing works (#2396)
1 parent 262a0d8 commit b85ee5e

File tree

1 file changed

+135
-0
lines changed

1 file changed

+135
-0
lines changed
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
---
2+
title: How Databend Copy-Free Data Sharing Works
3+
---
4+
5+
## What is Data Sharing?
6+
7+
Different teams need different parts of the same data. Traditional solutions copy data multiple times - expensive and hard to maintain.
8+
9+
Databend's **[ATTACH TABLE](/sql/sql-commands/ddl/table/attach-table)** solves this elegantly: create multiple "views" of the same data without copying it. This leverages Databend's **true compute-storage separation** - whether using cloud storage or on-premise object storage: **store once, access everywhere**.
10+
11+
Think of ATTACH TABLE like computer shortcuts - they point to the original file without duplicating it.
12+
13+
```
14+
Object Storage (S3, MinIO, Azure, etc.)
15+
┌─────────────┐
16+
│ Your Data │
17+
└──────┬──────┘
18+
19+
┌───────────────────────┼───────────────────────┐
20+
│ │ │
21+
▼ ▼ ▼
22+
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
23+
│ Marketing │ │ Finance │ │ Sales │
24+
│ Team View │ │ Team View │ │ Team View │
25+
└─────────────┘ └─────────────┘ └─────────────┘
26+
```
27+
28+
## How to Use ATTACH TABLE
29+
30+
**Step 1: Find your data location**
31+
```sql
32+
SELECT snapshot_location FROM FUSE_SNAPSHOT('default', 'company_sales');
33+
-- Result: 1/23351/_ss/... → Data at s3://your-bucket/1/23351/
34+
```
35+
36+
**Step 2: Create team-specific views**
37+
```sql
38+
-- Marketing: Customer behavior analysis
39+
ATTACH TABLE marketing_view (customer_id, product, amount, order_date)
40+
's3://your-bucket/1/23351/' CONNECTION = (AWS_KEY_ID = 'xxx', AWS_SECRET_KEY = 'yyy');
41+
42+
-- Finance: Revenue tracking
43+
ATTACH TABLE finance_view (order_id, amount, profit, order_date)
44+
's3://your-bucket/1/23351/' CONNECTION = (AWS_KEY_ID = 'xxx', AWS_SECRET_KEY = 'yyy');
45+
46+
-- HR: Employee info without salaries
47+
ATTACH TABLE hr_employees (employee_id, name, department)
48+
's3://data/1/23351/' CONNECTION = (...);
49+
50+
-- Development: Production structure without sensitive data
51+
ATTACH TABLE dev_customers (customer_id, country, created_date)
52+
's3://data/1/23351/' CONNECTION = (...);
53+
```
54+
55+
**Step 3: Query independently**
56+
```sql
57+
-- Marketing analyzes trends
58+
SELECT product, COUNT(*) FROM marketing_view GROUP BY product;
59+
60+
-- Finance tracks profit
61+
SELECT order_date, SUM(profit) FROM finance_view GROUP BY order_date;
62+
```
63+
64+
## Key Benefits
65+
66+
**Real-Time Updates**: When source data changes, all attached tables see it instantly
67+
```sql
68+
INSERT INTO company_sales VALUES (1001, 501, 'Laptop', 1299.99, 299.99, '[email protected]', '2024-01-20');
69+
SELECT COUNT(*) FROM marketing_view WHERE order_date = '2024-01-20'; -- Returns: 1
70+
```
71+
72+
**Column-Level Security**: Teams only see what they need - Marketing can't see profit, Finance can't see customer emails
73+
74+
**Strong Consistency**: Never read partial updates, always see complete snapshots - perfect for financial reporting and compliance
75+
76+
**Full Performance**: All indexes work automatically, same speed as regular tables
77+
78+
## Why This Matters
79+
80+
| Traditional Approach | Databend ATTACH TABLE |
81+
|---------------------|----------------------|
82+
| Multiple data copies | Single copy shared by all |
83+
| ETL delays, sync issues | Real-time, always current |
84+
| Complex maintenance | Zero maintenance |
85+
| More copies = more security risk | Fine-grained column access |
86+
| Slower due to data movement | Full optimization on original data |
87+
88+
## How It Works Under the Hood
89+
90+
```
91+
Query: SELECT product, SUM(amount) FROM marketing_view GROUP BY product
92+
93+
┌─────────────────────────────────────────────────────────────────┐
94+
│ Query Execution Flow │
95+
└─────────────────────────────────────────────────────────────────┘
96+
97+
User Query
98+
99+
100+
┌───────────────────┐ ┌─────────────────────────────────────┐
101+
│ 1. Read Snapshot │───►│ s3://bucket/1/23351/_ss/ │
102+
│ Metadata │ │ Get current table state │
103+
└───────────────────┘ └─────────────────────────────────────┘
104+
105+
106+
┌───────────────────┐ ┌─────────────────────────────────────┐
107+
│ 2. Apply Column │───►│ Filter: customer_id, product, │
108+
│ Filter │ │ amount, order_date │
109+
└───────────────────┘ └─────────────────────────────────────┘
110+
111+
112+
┌───────────────────┐ ┌─────────────────────────────────────┐
113+
│ 3. Check Stats & │───►│ • Segment min/max values │
114+
│ Indexes │ │ • Bloom filters │
115+
└───────────────────┘ │ • Aggregate indexes │
116+
│ └─────────────────────────────────────┘
117+
118+
┌───────────────────┐ ┌─────────────────────────────────────┐
119+
│ 4. Smart Data │───►│ Skip irrelevant blocks │
120+
│ Fetching │ │ Download only needed data from _b/ │
121+
└───────────────────┘ └─────────────────────────────────────┘
122+
123+
124+
┌───────────────────┐ ┌─────────────────────────────────────┐
125+
│ 5. Local │───►│ Full optimization & parallelism │
126+
│ Execution │ │ Process with all available indexes │
127+
└───────────────────┘ └─────────────────────────────────────┘
128+
129+
130+
Results: Product sales summary
131+
```
132+
133+
Multiple Databend clusters can execute this flow simultaneously without coordination - true compute-storage separation in action.
134+
135+
ATTACH TABLE represents a fundamental shift: **from copying data for each use case to one copy with many views**. Whether in cloud or on-premise environments, Databend's architecture enables powerful, efficient data sharing while maintaining enterprise-grade consistency and security.

0 commit comments

Comments
 (0)