Skip to content

Commit 1ab89c7

Browse files
committed
Update aggregate-histogram.md
1 parent 4866a20 commit 1ab89c7

File tree

1 file changed

+48
-39
lines changed

1 file changed

+48
-39
lines changed

docs/en/sql-reference/20-sql-functions/07-aggregate-functions/aggregate-histogram.md

Lines changed: 48 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -5,43 +5,35 @@ import FunctionDescription from '@site/src/components/FunctionDescription';
55

66
<FunctionDescription description="Introduced or updated: v1.2.377"/>
77

8-
Computes the distribution of the data. It uses an "equal height" bucketing strategy to generate the histogram. The result of the function returns an empty or Json string.
8+
Generates a data distribution histogram using an "equal height" bucketing strategy.
99

1010
## Syntax
1111

1212
```sql
1313
HISTOGRAM(<expr>)
14-
HISTOGRAM(<expr> [, max_num_buckets])
15-
```
16-
17-
`max_num_buckets` means the maximum number of buckets that can be used, by default it is 128.
1814

19-
For example:
20-
```sql
21-
select histogram(c_id) from histagg;
22-
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
23-
│ histogram(c_id) │
24-
│ Nullable(String) │
25-
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
26-
│ [{"lower":"1","upper":"1","ndv":1,"count":6,"pre_sum":0},{"lower":"2","upper":"2","ndv":1,"count":6,"pre_sum":6}] │
27-
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
15+
HISTOGRAM(<max_num_buckets>)(<expr>)
2816
```
29-
:::
30-
31-
## Arguments
3217

33-
| Arguments | Description |
34-
|-------------------|--------------------------------------------------------------------------------------------|
35-
| `<expr>` | The data type of `<expr>` should be sortable. |
36-
| `max_num_buckets` | Optional constant positive integer, the maximum number of buckets that can be used. |
18+
| Parameter | Description |
19+
|-------------------|-------------------------------------------------------------------------------------|
20+
| `expr` | The data type of `expr` should be sortable. |
21+
| `max_num_buckets` | Optional positive integer specifying the maximum number of buckets. Default is 128. |
3722

3823
## Return Type
3924

40-
the Nullable String type
25+
Returns either an empty string or a JSON object with the following structure:
4126

42-
## Example
27+
- **buckets**: List of buckets with detailed information:
28+
- **lower**: Lower bound of the bucket.
29+
- **upper**: Upper bound of the bucket.
30+
- **count**: Number of elements in the bucket.
31+
- **pre_sum**: Cumulative count of elements up to the current bucket.
32+
- **ndv**: Number of distinct values in the bucket.
4333

44-
**Create a Table and Insert Sample Data**
34+
## Examples
35+
36+
This example shows how the HISTOGRAM function analyzes the distribution of `c_int` values in the `histagg` table, returning bucket boundaries, distinct value counts, element counts, and cumulative counts:
4537

4638
```sql
4739
CREATE TABLE histagg (
@@ -58,24 +50,17 @@ INSERT INTO histagg VALUES
5850
(2, 21, 22, 23),
5951
(2, 31, 32, 33),
6052
(2, 10, 20, 30);
61-
```
6253

63-
**Query Demo 1**
64-
```sql
6554
SELECT HISTOGRAM(c_int) FROM histagg;
66-
```
6755

68-
**Result**
69-
```sql
7056
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
7157
│ histogram(c_int) │
72-
│ Nullable(String) │
7358
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
7459
│ [{"lower":"13","upper":"13","ndv":1,"count":1,"pre_sum":0},{"lower":"23","upper":"23","ndv":1,"count":1,"pre_sum":1},{"lower":"30","upper":"30","ndv":1,"count":2,"pre_sum":2},{"lower":"33","upper":"33","ndv":1,"count":2,"pre_sum":4}] │
7560
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
7661
```
7762

78-
Query result description:
63+
The result is returned as a JSON array:
7964

8065
```json
8166
[
@@ -110,11 +95,35 @@ Query result description:
11095
]
11196
```
11297

113-
Fields description:
98+
This example shows how `HISTOGRAM(2)` groups c_int values into two buckets:
99+
100+
```sql
101+
SELECT HISTOGRAM(2)(c_int) FROM histagg;
114102

115-
- buckets:All buckets
116-
- lower:Upper bound of the bucket
117-
- upper:Lower bound of the bucket
118-
- count:The number of elements contained in the bucket
119-
- pre_sum:The total number of elements in the front bucket
120-
- ndv:The number of distinct values in the bucket
103+
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
104+
│ histogram(2)(c_int) │
105+
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
106+
│ [{"lower":"13","upper":"30","ndv":3,"count":4,"pre_sum":0},{"lower":"33","upper":"33","ndv":1,"count":2,"pre_sum":4}] │
107+
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
108+
```
109+
110+
The result is returned as a JSON array:
111+
112+
```json
113+
[
114+
{
115+
"lower": "13",
116+
"upper": "30",
117+
"ndv": 3,
118+
"count": 4,
119+
"pre_sum": 0
120+
},
121+
{
122+
"lower": "33",
123+
"upper": "33",
124+
"ndv": 1,
125+
"count": 2,
126+
"pre_sum": 4
127+
}
128+
]
129+
```

0 commit comments

Comments
 (0)