Skip to content

Commit 1439c5e

Browse files
b41shChasen-Zhang
andauthored
docs: refactor virtual column docs (#2074)
* docs: refactor virtual column docs * fix broken link * fix * fix --------- Co-authored-by: z <[email protected]>
1 parent 998af59 commit 1439c5e

File tree

11 files changed

+169
-340
lines changed

11 files changed

+169
-340
lines changed

docs/cn/sql-reference/10-sql-commands/00-ddl/07-virtual-column/alter-virtual-column.md

Lines changed: 0 additions & 68 deletions
This file was deleted.

docs/cn/sql-reference/10-sql-commands/00-ddl/07-virtual-column/create-virtual-column.md

Lines changed: 0 additions & 49 deletions
This file was deleted.

docs/cn/sql-reference/10-sql-commands/00-ddl/07-virtual-column/drop-virtual-column.md

Lines changed: 0 additions & 28 deletions
This file was deleted.

docs/en/guides/55-performance/01-virtual-column.md

Lines changed: 53 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -7,33 +7,49 @@ import EEFeature from '@site/src/components/EEFeature';
77

88
<EEFeature featureName='VIRTUAL COLUMN'/>
99

10-
A virtual column is a construct formed by extracting nested fields within [Variant](/sql/sql-reference/data-types/variant) data and storing that data in separate storage files. Consider using virtual columns when you regularly query specific nested fields within Variant data to realize the following benefits:
10+
# Virtual Columns in Databend: Accelerating Queries on Semi-Structured Data
1111

12-
- **Accelerated Query Processing**: Virtual columns streamline the querying process by eliminating the need to traverse the entire nested structure to locate the desired data. Direct data retrieval from virtual columns parallels the process of accessing regular columns, resulting in a significant acceleration of query execution.
12+
Virtual columns in Databend provide a powerful and automatic way to significantly accelerate queries on semi-structured data, particularly data stored in the [Variant](/sql/sql-reference/data-types/variant) data type. This feature dynamically optimizes data access, leading to faster query execution and reduced resource consumption.
1313

14-
- **Reduced Memory Usage**: Variant data often includes numerous internal fields, and reading all of them can lead to substantial memory consumption. By transitioning to reading virtual columns, there is a notable reduction in memory usage, mitigating the risk of potential memory overflows.
14+
## Overview
15+
16+
When working with nested data structures within `VARIANT` columns, accessing specific data points can be a performance bottleneck. Databend's virtual columns address this by automatically identifying and optimizing nested fields. Instead of repeatedly traversing the entire nested structure, virtual columns enable direct data retrieval, similar to accessing regular columns.
17+
18+
Databend automatically detects nested fields within `VARIANT` columns during data ingestion. If a field meets a certain threshold for presence, it's materialized as a virtual column in the background, ensuring that data is readily available for optimized querying. This process is entirely automatic, requiring no manual configuration or intervention.
1519

1620
![Alt text](/img/sql/virtual-column.png)
1721

18-
## Managing Virtual Columns
22+
## Performance Benefits
23+
24+
* **Significant Query Acceleration:** Virtual columns dramatically reduce query execution time by enabling direct access to nested fields. This eliminates the overhead of traversing complex JSON structures for each query.
25+
* **Reduced Resource Consumption:** By materializing only the necessary nested fields, virtual columns minimize memory consumption during query processing. This leads to more efficient resource utilization and improved overall system performance.
26+
* **Automatic Optimization:** Databend automatically identifies and materializes fields as virtual columns. The query optimizer then automatically rewrites queries to utilize these virtual columns when accessing data within the `VARIANT` column.
27+
* **Transparent Operation:** The creation and management of virtual columns are entirely transparent to the user. Queries are automatically optimized without requiring any changes to the query syntax or data loading process. The query optimizer handles the rewriting of queries to leverage virtual columns.
28+
29+
## How it Works
1930

20-
Databend provides a variety of commands to manage virtual columns. For details, see [VIRTUAL COLUMN](/sql/sql-commands/ddl/virtual-column/).
31+
1. **Data Ingestion:** When data containing `VARIANT` columns is ingested, Databend analyzes the structure of the JSON data.
32+
2. **Field Presence Check:** Databend checks if a nested field meets a certain threshold for presence.
33+
3. **Virtual Column Materialization:** If the field presence threshold is met, the system automatically materializes the field as a virtual column in the background.
34+
4. **Query Optimization:** When a query accesses a nested field within a `VARIANT` column, the query optimizer automatically rewrites the query to use the corresponding virtual column for faster data retrieval.
35+
36+
## Important Considerations
37+
38+
* **Overhead:** While virtual columns generally improve query performance, they do introduce some storage and maintenance overhead. Databend automatically balances the benefits of virtual columns against this overhead to ensure optimal performance.
39+
* **Experimental Feature:** Virtual columns are currently an experimental feature. They are disabled by default. To enable virtual columns, you must set the `enable_experimental_virtual_column` setting to `1`:
40+
* **Automatic Refresh:** Virtual columns will be refreshed automatically after inserting data. If you don't want to generate virtual column data automatically, you can set `enable_refresh_virtual_column_after_write` to `0` to disable the generation of virtual columns. Asynchronous refresh can be done by using the refresh virtual column command. For details, see [REFRESH VIRTUAL COLUMN](/sql/sql-commands/ddl/virtual-column/refresh-virtual-column).
41+
* **Show Virtual columns:** You can view information about virtual columns through the [SHOW VIRTUAL COLUMNS](/sql/sql-commands/ddl/virtual-column/show-virtual-columns) command, and you can view information about virtual column metas through the [FUSE_VIRTUAL_COLUMN](/sql/sql-functions/system-functions/fuse_virtual_column) system function.
2142

2243
## Usage Examples
2344

2445
This example demonstrates the practical use of virtual columns and their impact on query execution:
2546

2647
```sql
48+
SET enable_experimental_virtual_column=1;
49+
2750
-- Create a table named 'test' with columns 'id' and 'val' of type Variant.
2851
CREATE TABLE test(id int, val variant);
2952

30-
-- Create virtual columns for specific elements in the 'val' column.
31-
CREATE VIRTUAL COLUMN (
32-
val ['name'], -- Extract the 'name' field.
33-
val ['tags'] [0], -- Extract the first element in the 'tags' array.
34-
val ['pricings'] [0] ['type'] -- Extract the 'type' field from the first pricing in the 'pricings' array.
35-
) FOR test;
36-
3753
-- Insert sample records into the 'test' table with Variant data.
3854
INSERT INTO
3955
test
@@ -65,8 +81,7 @@ INSERT INTO test SELECT * FROM test;
6581
INSERT INTO test SELECT * FROM test;
6682
INSERT INTO test SELECT * FROM test;
6783

68-
-- Refresh the virtual columns
69-
REFRESH VIRTUAL COLUMN FOR test;
84+
-- Show the virtual columns
7085

7186
-- Explain the query execution plan for selecting specific fields from the table.
7287
EXPLAIN
@@ -79,16 +94,16 @@ FROM
7994

8095
-[ EXPLAIN ]-----------------------------------
8196
Exchange
82-
├── output columns: [test.val['name'] (#2), test.val['tags'][0] (#3), test.val['pricings'][0]['type'] (#4)]
97+
├── output columns: [test.val['name'] (#3), test.val['pricings'][0]['type'] (#5), test.val['tags'][0] (#8)]
8398
├── exchange type: Merge
8499
└── TableScan
85-
├── table: default.book_db.test
86-
├── output columns: [val['name'] (#2), val['tags'][0] (#3), val['pricings'][0]['type'] (#4)]
100+
├── table: default.default.test
101+
├── output columns: [val['name'] (#3), val['pricings'][0]['type'] (#5), val['tags'][0] (#8)]
87102
├── read rows: 160
88-
├── read size: 4.96 KiB
89-
├── partitions total: 16
90-
├── partitions scanned: 16
91-
├── pruning stats: [segments: <range pruning: 6 to 6>, blocks: <range pruning: 16 to 16>]
103+
├── read size: 1.69 KiB
104+
├── partitions total: 6
105+
├── partitions scanned: 6
106+
├── pruning stats: [segments: <range pruning: 6 to 6>, blocks: <range pruning: 6 to 6>]
92107
├── push downs: [filters: [], limit: NONE]
93108
├── virtual columns: [val['name'], val['pricings'][0]['type'], val['tags'][0]]
94109
└── estimated rows: 160.00
@@ -108,23 +123,28 @@ Exchange
108123
├── table: default.book_db.test
109124
├── output columns: [val['name'] (#2)]
110125
├── read rows: 160
111-
├── read size: 1.70 KiB
126+
├── read size: < 1 KiB
112127
├── partitions total: 16
113128
├── partitions scanned: 16
114129
├── pruning stats: [segments: <range pruning: 6 to 6>, blocks: <range pruning: 16 to 16>]
115130
├── push downs: [filters: [], limit: NONE]
116131
├── virtual columns: [val['name']]
117132
└── estimated rows: 160.00
118133

119-
-- Display all the virtual columns defined in the system.
120-
SHOW VIRTUAL COLUMNS;
121-
122-
┌─────────────────────────────────────────────────────────────────────────────┐
123-
│ database │ table │ virtual_columns │
124-
├──────────┼────────┼─────────────────────────────────────────────────────────┤
125-
│ default │ test │ val['name'], val['pricings'][0]['type'], val['tags'][0] │
126-
└─────────────────────────────────────────────────────────────────────────────┘
127-
128-
-- Drop the virtual columns associated with the 'test' table.
129-
DROP VIRTUAL COLUMN FOR test;
134+
-- Display all the auto generated virtual columns.
135+
SHOW VIRTUAL COLUMNS WHERE table='test';
136+
137+
╭────────────────────────────────────────────────────────────────────────────────────────────────────────╮
138+
│ database │ table │ source_column │ virtual_column_id │ virtual_column_name │ virtual_column_type │
139+
│ String │ String │ String │ UInt32 │ String │ String │
140+
├──────────┼────────┼───────────────┼───────────────────┼──────────────────────────┼─────────────────────┤
141+
│ default │ test │ val │ 3000000000 │ ['id'] │ UInt64 │
142+
│ default │ test │ val │ 3000000001 │ ['name'] │ String │
143+
│ default │ test │ val │ 3000000002 │ ['pricings'][0]['price'] │ String │
144+
│ default │ test │ val │ 3000000003 │ ['pricings'][0]['type'] │ String │
145+
│ default │ test │ val │ 3000000004 │ ['pricings'][1]['price'] │ String │
146+
│ default │ test │ val │ 3000000005 │ ['pricings'][1]['type'] │ String │
147+
│ default │ test │ val │ 3000000006 │ ['tags'][0] │ String │
148+
│ default │ test │ val │ 3000000007 │ ['tags'][1] │ String │
149+
╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯
130150
```

docs/en/sql-reference/00-sql-reference/31-system-tables/system-virtual-columns.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,15 @@ Contains information about the created virtual columns in the system.
1111
See also: [SHOW VIRTUAL COLUMNS](../../10-sql-commands/00-ddl/07-virtual-column/show-virtual-columns.md)
1212

1313
```sql
14+
SET enable_experimental_virtual_column=1;
15+
1416
SELECT * FROM system.virtual_columns;
1517

16-
┌───────────────────────────────────────────────────────────────────────────────────────────────┐
17-
│ database │ table │ virtual_columns │ created_on │ updated_on │
18-
├──────────┼────────┼─────────────────┼────────────────────────────┼────────────────────────────┤
19-
│ default │ test │ val['name'] │ 2023-12-25 21:24:26.1277902023-12-25 21:24:38.455268
20-
└───────────────────────────────────────────────────────────────────────────────────────────────┘
18+
╭───────────────────────────────────────────────────────────────────────────────────────────────────╮
19+
│ database │ table │ source_column │ virtual_column_id │ virtual_column_name │ virtual_column_type │
20+
│ String │ String │ String │ UInt32 │ String │ String │
21+
├──────────┼────────┼───────────────┼───────────────────┼─────────────────────┼─────────────────────┤
22+
│ default │ test │ val │ 3000000000 │ ['id'] │ UInt64 │
23+
│ default │ test │ val │ 3000000001 │ ['name'] │ String │
24+
╰───────────────────────────────────────────────────────────────────────────────────────────────────╯
2125
```

0 commit comments

Comments
 (0)