Skip to content

Commit 7b52d39

Browse files
committed
technotes: SQL statistics
Tech notes on SQL Statistics. Part Of CRDB-35839 Release note: None
1 parent 94cbfe1 commit 7b52d39

File tree

1 file changed

+226
-0
lines changed

1 file changed

+226
-0
lines changed
Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
# SQL Statistics
2+
Last Update: February 2024
3+
4+
Original author: maryliag
5+
6+
This document provides an overview of the collection of SQL Statistics, where they're
7+
stored and how they're used on the Console UI.
8+
9+
Table of contents:
10+
11+
- [Overview](#overview)
12+
- [Data Generation and Storage](#data-generation-and-storage)
13+
- [Data Aggregation and Cardinality](#data-aggregation-and-cardinality)
14+
- [Data Cleanup](#data-cleanup)
15+
- [Data Access](#data-access)
16+
- [Cluster Settings](#cluster-settings)
17+
18+
## Overview
19+
The SQL statistics feature provides observability into statements and transactions
20+
execution, which aims to help operators debug specific statements and transactions
21+
that can cause degraded cluster performance. The statistics can be accesses in many ways,
22+
the main one being the SQL Activity page on Console UI. The following sections will
23+
detail how this data is generated, stored, cleanup and accessed.
24+
25+
## Data Generation and Storage
26+
When a statement is executed, different levels of information are collected at
27+
different steps. When the statement is completed, the full data is populated into
28+
a single object to be recorded. Initially this statement is recorded in-memory only
29+
(`crdb_internal.cluster_statement_statistics`). A flush job is scheduled to run on a
30+
frequency defined by `sql.stats.flush.interval` (default 10 minutes) and it will persist
31+
the in-memory data to a system table (`system.statement_statistics`).
32+
33+
After the flush job is completed, another job is called to populate the top activity
34+
table (`system.statement_activity`). On this job we pre-compute the data against a few
35+
criteria (% of All Runtime, Contention, Execution Count, P99 Latency, SQL CPU Time and
36+
Statement Time) for faster access. We also aggregated when possible, for example,
37+
aggregate all executions of the same fingerprint on the same hour that had a different
38+
gateway node.
39+
40+
The same principles apply to transaction statistics, also powered by the SQL Stats
41+
subsystem. For transactions, replace all the `statement` for `transaction` in the tables
42+
and view names.
43+
44+
```mermaid
45+
flowchart TD
46+
exec[Statement executed]
47+
memory[[crdb_internal.
48+
cluster_statement_statistics]]
49+
persisted[(system.
50+
statement_statistics)]
51+
combined[[crdb_internal.statement_statistics]]
52+
activity[(system.
53+
statement_activity)]
54+
persisted_view[[crdb_internal.
55+
statement_statistics_persisted]]
56+
activity_view[[crdb_internal.statement_activity]]
57+
console(["Console"])
58+
exec --> memory
59+
memory --> |flushed| persisted
60+
memory --> combined
61+
persisted --> combined
62+
persisted --> |update job|activity
63+
persisted --> persisted_view
64+
activity --> activity_view
65+
persisted_view --> console
66+
activity_view --> console
67+
```
68+
69+
Node-global size limits are placed on the group of structures storing the in-memory
70+
data, based on memory usage and unique fingerprint count. The
71+
`sql.metrics.max_mem_stmt_fingerprints` (default 100k) and
72+
`sql.metrics.max_mem_txn_fingerprints` (default 100k) determine the max number of
73+
unique in-memory fingerprints allowed for statements and transactions, respectively.
74+
Reaching this size limit can trigger more frequent flushes, but such a flush would be
75+
aborted if an amount of time defined by `sql.stats.flush.minimum_interval` has yet to
76+
pass since the previous flush.
77+
78+
Flushes can also be aborted if the sink system tables have reached a combined row
79+
count greater than or equal to 1.5 * `sql.stats.persisted_rows.max` (default 1M). A
80+
factor of 1.5 is considered to give the table room to continue being written to as
81+
cleanup jobs run (see [Data Cleanup](#data-cleanup) below).
82+
83+
## Data Aggregation and Cardinality
84+
When the statement is being recorded in memory, it gets aggregated with other
85+
executions of the same fingerprint, and when the flush happens it gets aggregated with
86+
all other executions for thar fingerprint on the current aggregation timestamp, which is
87+
defined by `sql.stats.aggregation.interval` (default 1h). This mean everything executed
88+
on hour 1:XX will be stored as hour 1:00.
89+
90+
The SQL Stats subsystem works to reduce cardinality by aggregating statistics together at the node-level, with aggregation keys for Statements consisting of:
91+
- Aggregated timestamp
92+
- Statement fingerprint ID
93+
- Transaction fingerprint ID
94+
- Plan hash
95+
- App name
96+
- Node ID
97+
98+
Transaction statistics use a smaller set of components in their aggregation keys, consisting of:
99+
- Aggregated timestamp
100+
- Transaction fingerprint ID
101+
- App name
102+
- Node ID
103+
104+
A statement fingerprint ID is created by hashing the statement fingerprint (the query
105+
with the constants redacted), its database and failure status, and if it was part of an
106+
implicit txn. A transaction fingerprint ID is the hashed string constructed using the
107+
individual statement fingerprint IDs that comprise the transaction.
108+
109+
110+
## Data Cleanup
111+
The tables mentioned above have their own data cleanup process:
112+
113+
The `crdb_internal.cluster_statement_statistics` and
114+
`crdb_internal.cluster_transaction_statistics` tables, which again represent the
115+
in-memory caches of aggregated data for each, can reach up to 100k rows.
116+
Once that limit is reached or 10 minutes have passed (the frequency of the flush job),
117+
a flush operation is called and all the data is moved to the
118+
`system.statement_statistics` or `system.transaction_statistics` tables and
119+
consequently removed from the in-memory cache. If a flush operation is not possible
120+
at the moment, statistics for new fingerprints are rejected, but aggregation of
121+
statistics for fingerprints already existing in the cache continues.
122+
123+
The `system.statement_statistics` and `system.transaction_statistics` tables can
124+
reach up to 1.5 * `sql.stats.persisted_rows.max` (setting default 1M) rows.
125+
A job called `sql-stats-compaction` is called on the frequency defined by
126+
`sql.stats.limit_table_size_check.interval` (default 1h) and will delete older rows
127+
that have exceeded the max row count. If the max limit has been reached, the new data
128+
will be discarded.
129+
130+
The above cleanup job can be toggled on/off with the
131+
`sql.stats.limit_table_size.enabled` setting (default enabled). The compaction job
132+
performs DELETE statements in iterations until the number of rows is below the maximum.
133+
Each iteration deletes a maximum of `sql.stats.cleanup.rows_to_delete_per_txn` (default
134+
10k) rows.
135+
136+
The table `system.statement_activity` can reach up to 200k rows, which comes from
137+
500(top limit) * 6(num columns) * 24(hrs) * 3(days) = 216000 (rounded down to give
138+
an even number). Every flush operation will check if the limit was reached and delete
139+
all the excess data before adding the new ones. To keep things in sync, after rows
140+
are deleted from the table `system.statement_activity` we also delete their correspondent
141+
ones from the table `system.transaction_activity` and vice-versa to keep the data
142+
consistent between the two (since one of them could have reached the limit before the
143+
other).
144+
145+
146+
## Data Access
147+
148+
UI and API clients access the data via `crdb_internal` tables to control the proper
149+
access of the data across sources (e.g. in-memory, persisted, and top activity tables).
150+
The crdb_internal tables are views on top of the system tables to enable non-admin
151+
users to access them.
152+
153+
There are a few different options when accessing the statement statistics data from
154+
`crdb_internal`:
155+
1. The in-memory data alone can be accessed on `crdb_internal.cluster_statement_statistics`.
156+
2. The combined data of in-memory and persisted stats can be accessed on `crdb_internal.statement_statistics`.
157+
3. The persisted data can be accessed on `crdb_internal.statement_statistics_persisted`.
158+
4. The top activity table can be accessed on `crdb_internal.statement_activity`.
159+
160+
Likewise, we have similar options for accessing transaction statistics data:
161+
1. The in-memory data alone can be accessed on `crdb_internal.cluster_transaction_statistics`.
162+
2. The combined data of in-memory and persisted stats can be accessed on `crdb_internal.transaction_statistics`.
163+
3. The persisted data can be accessed on `crdb_internal.transaction_statistics_persisted`.
164+
4. The top activity table can be accessed on `crdb_internal.transaction_activity`.
165+
166+
The diagram below show how these tables are used to populate the SQL Activity page.
167+
168+
```mermaid
169+
flowchart TD;
170+
A[Compare the Timestamp on ACTIVITY Table
171+
with the Requested Timestamp] --> B{Is the requested time
172+
period completely \non the table?}
173+
B -- Yes --> C[SELECT on ACTIVITY table]
174+
C --> D{Had results?}
175+
D -- Yes --> E[Return RESULTS]
176+
D -- No --> F[SELECT on PERSISTED table]
177+
B -- No ----> F
178+
F --> G{Had results?}
179+
G -- Yes --> E
180+
G -- No --> H[SELECT on COMBINED table]
181+
H --> E
182+
```
183+
184+
## Cluster Settings
185+
There are several cluster settings that control different parts of this system. Here
186+
is a list of them (does not include the ones from other areas such as Insights):
187+
188+
| Cluster Setting | Sub-System | Description | Default value |
189+
|-------------------------------------------------------------------------|------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
190+
| `sql.metrics.statement_details.enabled` | SQL Stats Collection | collect per-statement query statistics | true |
191+
| `sql.metrics.transaction_details.max_statement_ids` | SQL Stats Collection | max number of statement fingerprint IDs to store for transaction statistics | 1k |
192+
| `sql.metrics.transaction_details.enabled` | SQL Stats Collection | collect per-application transaction statistics | true |
193+
| `sql.metrics.statement_details.threshold` | SQL Stats Collection | minimum execution time to cause statement statistics to be collected. If configured, no transaction stats are collected. | 0 |
194+
| `sql.metrics.statement_details.plan_collection.enabled` | SQL Stats Collection | periodically save a logical plan for each fingerprint | false |
195+
| `sql.metrics.statement_details.plan_collection.period` | SQL Stats Collection | the time until a new logical plan is collected | 5 minutes |
196+
| `sql.metrics.max_mem_stmt_fingerprints` | SQL Stats Collection | the maximum number of statement fingerprints stored in memory | 100k |
197+
| `sql.metrics.max_mem_txn_fingerprints` | SQL Stats Collection | the maximum number of transaction fingerprints stored in memory | 100k |
198+
| `sql.metrics.max_mem_reported_stmt_fingerprints` | SQL Stats Collection | the maximum number of reported statement fingerprints stored in memory | 100k |
199+
| `sql.metrics.max_mem_reported_txn_fingerprints` | SQL Stats Collection | the maximum number of reported transaction fingerprints stored in memory | 100k |
200+
| `sql.metrics.max_stmt_fingerprints_per_explicit_txn` | SQL Stats Collection | the maximum number of statement fingerprints stored per explicit transaction | 2k |
201+
| `sql.metrics.statement_details.index_recommendation_collection.enabled` | SQL Stats Collection | generate an index recommendation for each fingerprint ID | true |
202+
| `sql.metrics.statement_details.max_mem_reported_idx_recommendations` | SQL Stats Collection | the maximum number of reported index recommendation info stored in memory | 5k |
203+
| `sql.metrics.statement_details.gateway_node.enabled` | SQL Stats Collection | save the gateway node for each statement fingerprint. If false, the value will be stored as 0 | false |
204+
| `sql.stats.flush.interval` | Persisted SQL Stats | the interval at which SQL execution statistics are flushed to disk | 10 minutes |
205+
| `sql.stats.flush.minimum_interval` | Persisted SQL Stats | the minimum interval that SQL stats can be flushed to disk. If a flush operation starts within less than the minimum interval, the flush operation will be aborted | 0 |
206+
| `sql.stats.flush.force_cleanup.enabled` | Persisted SQL Stats | if set, older SQL stats are discarded periodically when flushing to persisted tables is disabled | false |
207+
| `sql.stats.flush.enabled` | Persisted SQL Stats | if set, SQL execution statistics are periodically flushed to disk | true |
208+
| `sql.stats.flush.jitter` | Persisted SQL Stats | jitter fraction on the duration between sql stats flushes | 0.15 |
209+
| `sql.stats.persisted_rows.max` | Persisted SQL Stats | maximum number of rows of statement and transaction statistics that will be persisted in the system tables | 1M |
210+
| `sql.stats.cleanup.recurrence` | Persisted SQL Stats | cron-tab recurrence for SQL Stats cleanup job | @hourly |
211+
| `sql.stats.aggregation.interval` | Persisted SQL Stats | the interval at which we aggregate SQL execution statistics upon flush, this value must be greater than or equal to sql.stats.flush.interval | 1h |
212+
| `sql.stats.cleanup.rows_to_delete_per_txn` | Persisted SQL Stats | number of rows the compaction job deletes from system table per iteration | 10k |
213+
| `sql.stats.limit_table_size.enabled` | Persisted SQL Stats | controls whether we allow statement and transaction statistics tables to grow past sql.stats.persisted_rows.max | true |
214+
| `sql.stats.limit_table_size_check.interval` | Persisted SQL Stats | controls what interval the check is done if the statement and transaction statistics tables have grown past sql.stats.persisted_rows.max | 1h |
215+
| `sql.stats.activity.flush.enabled` | Top Sql Stats Activity | enable the flush to the system statement and transaction activity tables | true |
216+
| `sql.stats.activity.top.max` | Top Sql Stats Activity | the limit per column for the top number of statistics to be flushed to the activity tables | 500 |
217+
| `sql.stats.activity.persisted_rows.max` | Top Sql Stats Activity | maximum number of rows of statement and transaction activity that will be persisted in the system tables | 200k |
218+
| `sql.stats.response.max` | SQL Stats Endpoint | the maximum number of statements and transaction stats returned in a CombinedStatements request | 20k |
219+
| `sql.stats.response.show_internal.enabled` | SQL Stats Endpoint | controls if statistics for internal executions should be returned by the CombinedStatements | false |
220+
| `sql.stats.activity.ui.enabled` | SQL Stats Endpoint | enable the combined statistics endpoint to get data from the system activity tables | true |
221+
222+
223+
224+
225+
226+

0 commit comments

Comments
 (0)