|
| 1 | +# SQL Statistics |
| 2 | +Last Update: February 2024 |
| 3 | + |
| 4 | +Original author: maryliag |
| 5 | + |
| 6 | +This document provides an overview of the collection of SQL Statistics, where they're |
| 7 | +stored and how they're used on the Console UI. |
| 8 | + |
| 9 | +Table of contents: |
| 10 | + |
| 11 | +- [Overview](#overview) |
| 12 | +- [Data Generation and Storage](#data-generation-and-storage) |
| 13 | +- [Data Aggregation and Cardinality](#data-aggregation-and-cardinality) |
| 14 | +- [Data Cleanup](#data-cleanup) |
| 15 | +- [Data Access](#data-access) |
| 16 | +- [Cluster Settings](#cluster-settings) |
| 17 | + |
| 18 | +## Overview |
| 19 | +The SQL statistics feature provides observability into statements and transactions |
| 20 | +execution, which aims to help operators debug specific statements and transactions |
| 21 | +that can cause degraded cluster performance. The statistics can be accesses in many ways, |
| 22 | +the main one being the SQL Activity page on Console UI. The following sections will |
| 23 | +detail how this data is generated, stored, cleanup and accessed. |
| 24 | + |
| 25 | +## Data Generation and Storage |
| 26 | +When a statement is executed, different levels of information are collected at |
| 27 | +different steps. When the statement is completed, the full data is populated into |
| 28 | +a single object to be recorded. Initially this statement is recorded in-memory only |
| 29 | +(`crdb_internal.cluster_statement_statistics`). A flush job is scheduled to run on a |
| 30 | +frequency defined by `sql.stats.flush.interval` (default 10 minutes) and it will persist |
| 31 | +the in-memory data to a system table (`system.statement_statistics`). |
| 32 | + |
| 33 | +After the flush job is completed, another job is called to populate the top activity |
| 34 | +table (`system.statement_activity`). On this job we pre-compute the data against a few |
| 35 | +criteria (% of All Runtime, Contention, Execution Count, P99 Latency, SQL CPU Time and |
| 36 | +Statement Time) for faster access. We also aggregated when possible, for example, |
| 37 | +aggregate all executions of the same fingerprint on the same hour that had a different |
| 38 | +gateway node. |
| 39 | + |
| 40 | +The same principles apply to transaction statistics, also powered by the SQL Stats |
| 41 | +subsystem. For transactions, replace all the `statement` for `transaction` in the tables |
| 42 | +and view names. |
| 43 | + |
| 44 | +```mermaid |
| 45 | +flowchart TD |
| 46 | + exec[Statement executed] |
| 47 | + memory[[crdb_internal. |
| 48 | + cluster_statement_statistics]] |
| 49 | + persisted[(system. |
| 50 | + statement_statistics)] |
| 51 | + combined[[crdb_internal.statement_statistics]] |
| 52 | + activity[(system. |
| 53 | + statement_activity)] |
| 54 | + persisted_view[[crdb_internal. |
| 55 | + statement_statistics_persisted]] |
| 56 | + activity_view[[crdb_internal.statement_activity]] |
| 57 | + console(["Console"]) |
| 58 | + exec --> memory |
| 59 | + memory --> |flushed| persisted |
| 60 | + memory --> combined |
| 61 | + persisted --> combined |
| 62 | + persisted --> |update job|activity |
| 63 | + persisted --> persisted_view |
| 64 | + activity --> activity_view |
| 65 | + persisted_view --> console |
| 66 | + activity_view --> console |
| 67 | +``` |
| 68 | + |
| 69 | +Node-global size limits are placed on the group of structures storing the in-memory |
| 70 | +data, based on memory usage and unique fingerprint count. The |
| 71 | +`sql.metrics.max_mem_stmt_fingerprints` (default 100k) and |
| 72 | +`sql.metrics.max_mem_txn_fingerprints` (default 100k) determine the max number of |
| 73 | +unique in-memory fingerprints allowed for statements and transactions, respectively. |
| 74 | +Reaching this size limit can trigger more frequent flushes, but such a flush would be |
| 75 | +aborted if an amount of time defined by `sql.stats.flush.minimum_interval` has yet to |
| 76 | +pass since the previous flush. |
| 77 | + |
| 78 | +Flushes can also be aborted if the sink system tables have reached a combined row |
| 79 | +count greater than or equal to 1.5 * `sql.stats.persisted_rows.max` (default 1M). A |
| 80 | +factor of 1.5 is considered to give the table room to continue being written to as |
| 81 | +cleanup jobs run (see [Data Cleanup](#data-cleanup) below). |
| 82 | + |
| 83 | +## Data Aggregation and Cardinality |
| 84 | +When the statement is being recorded in memory, it gets aggregated with other |
| 85 | +executions of the same fingerprint, and when the flush happens it gets aggregated with |
| 86 | +all other executions for thar fingerprint on the current aggregation timestamp, which is |
| 87 | +defined by `sql.stats.aggregation.interval` (default 1h). This mean everything executed |
| 88 | +on hour 1:XX will be stored as hour 1:00. |
| 89 | + |
| 90 | +The SQL Stats subsystem works to reduce cardinality by aggregating statistics together at the node-level, with aggregation keys for Statements consisting of: |
| 91 | +- Aggregated timestamp |
| 92 | +- Statement fingerprint ID |
| 93 | +- Transaction fingerprint ID |
| 94 | +- Plan hash |
| 95 | +- App name |
| 96 | +- Node ID |
| 97 | + |
| 98 | +Transaction statistics use a smaller set of components in their aggregation keys, consisting of: |
| 99 | +- Aggregated timestamp |
| 100 | +- Transaction fingerprint ID |
| 101 | +- App name |
| 102 | +- Node ID |
| 103 | + |
| 104 | +A statement fingerprint ID is created by hashing the statement fingerprint (the query |
| 105 | +with the constants redacted), its database and failure status, and if it was part of an |
| 106 | +implicit txn. A transaction fingerprint ID is the hashed string constructed using the |
| 107 | +individual statement fingerprint IDs that comprise the transaction. |
| 108 | + |
| 109 | + |
| 110 | +## Data Cleanup |
| 111 | +The tables mentioned above have their own data cleanup process: |
| 112 | + |
| 113 | +The `crdb_internal.cluster_statement_statistics` and |
| 114 | +`crdb_internal.cluster_transaction_statistics` tables, which again represent the |
| 115 | +in-memory caches of aggregated data for each, can reach up to 100k rows. |
| 116 | +Once that limit is reached or 10 minutes have passed (the frequency of the flush job), |
| 117 | +a flush operation is called and all the data is moved to the |
| 118 | +`system.statement_statistics` or `system.transaction_statistics` tables and |
| 119 | +consequently removed from the in-memory cache. If a flush operation is not possible |
| 120 | +at the moment, statistics for new fingerprints are rejected, but aggregation of |
| 121 | +statistics for fingerprints already existing in the cache continues. |
| 122 | + |
| 123 | +The `system.statement_statistics` and `system.transaction_statistics` tables can |
| 124 | +reach up to 1.5 * `sql.stats.persisted_rows.max` (setting default 1M) rows. |
| 125 | +A job called `sql-stats-compaction` is called on the frequency defined by |
| 126 | +`sql.stats.limit_table_size_check.interval` (default 1h) and will delete older rows |
| 127 | +that have exceeded the max row count. If the max limit has been reached, the new data |
| 128 | +will be discarded. |
| 129 | + |
| 130 | +The above cleanup job can be toggled on/off with the |
| 131 | +`sql.stats.limit_table_size.enabled` setting (default enabled). The compaction job |
| 132 | +performs DELETE statements in iterations until the number of rows is below the maximum. |
| 133 | +Each iteration deletes a maximum of `sql.stats.cleanup.rows_to_delete_per_txn` (default |
| 134 | +10k) rows. |
| 135 | + |
| 136 | +The table `system.statement_activity` can reach up to 200k rows, which comes from |
| 137 | +500(top limit) * 6(num columns) * 24(hrs) * 3(days) = 216000 (rounded down to give |
| 138 | +an even number). Every flush operation will check if the limit was reached and delete |
| 139 | +all the excess data before adding the new ones. To keep things in sync, after rows |
| 140 | +are deleted from the table `system.statement_activity` we also delete their correspondent |
| 141 | +ones from the table `system.transaction_activity` and vice-versa to keep the data |
| 142 | +consistent between the two (since one of them could have reached the limit before the |
| 143 | +other). |
| 144 | + |
| 145 | + |
| 146 | +## Data Access |
| 147 | + |
| 148 | +UI and API clients access the data via `crdb_internal` tables to control the proper |
| 149 | +access of the data across sources (e.g. in-memory, persisted, and top activity tables). |
| 150 | +The crdb_internal tables are views on top of the system tables to enable non-admin |
| 151 | +users to access them. |
| 152 | + |
| 153 | +There are a few different options when accessing the statement statistics data from |
| 154 | +`crdb_internal`: |
| 155 | +1. The in-memory data alone can be accessed on `crdb_internal.cluster_statement_statistics`. |
| 156 | +2. The combined data of in-memory and persisted stats can be accessed on `crdb_internal.statement_statistics`. |
| 157 | +3. The persisted data can be accessed on `crdb_internal.statement_statistics_persisted`. |
| 158 | +4. The top activity table can be accessed on `crdb_internal.statement_activity`. |
| 159 | + |
| 160 | +Likewise, we have similar options for accessing transaction statistics data: |
| 161 | +1. The in-memory data alone can be accessed on `crdb_internal.cluster_transaction_statistics`. |
| 162 | +2. The combined data of in-memory and persisted stats can be accessed on `crdb_internal.transaction_statistics`. |
| 163 | +3. The persisted data can be accessed on `crdb_internal.transaction_statistics_persisted`. |
| 164 | +4. The top activity table can be accessed on `crdb_internal.transaction_activity`. |
| 165 | + |
| 166 | +The diagram below show how these tables are used to populate the SQL Activity page. |
| 167 | + |
| 168 | +```mermaid |
| 169 | +flowchart TD; |
| 170 | + A[Compare the Timestamp on ACTIVITY Table |
| 171 | + with the Requested Timestamp] --> B{Is the requested time |
| 172 | + period completely \non the table?} |
| 173 | + B -- Yes --> C[SELECT on ACTIVITY table] |
| 174 | + C --> D{Had results?} |
| 175 | + D -- Yes --> E[Return RESULTS] |
| 176 | + D -- No --> F[SELECT on PERSISTED table] |
| 177 | + B -- No ----> F |
| 178 | + F --> G{Had results?} |
| 179 | + G -- Yes --> E |
| 180 | + G -- No --> H[SELECT on COMBINED table] |
| 181 | + H --> E |
| 182 | +``` |
| 183 | + |
| 184 | +## Cluster Settings |
| 185 | +There are several cluster settings that control different parts of this system. Here |
| 186 | +is a list of them (does not include the ones from other areas such as Insights): |
| 187 | + |
| 188 | +| Cluster Setting | Sub-System | Description | Default value | |
| 189 | +|-------------------------------------------------------------------------|------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------| |
| 190 | +| `sql.metrics.statement_details.enabled` | SQL Stats Collection | collect per-statement query statistics | true | |
| 191 | +| `sql.metrics.transaction_details.max_statement_ids` | SQL Stats Collection | max number of statement fingerprint IDs to store for transaction statistics | 1k | |
| 192 | +| `sql.metrics.transaction_details.enabled` | SQL Stats Collection | collect per-application transaction statistics | true | |
| 193 | +| `sql.metrics.statement_details.threshold` | SQL Stats Collection | minimum execution time to cause statement statistics to be collected. If configured, no transaction stats are collected. | 0 | |
| 194 | +| `sql.metrics.statement_details.plan_collection.enabled` | SQL Stats Collection | periodically save a logical plan for each fingerprint | false | |
| 195 | +| `sql.metrics.statement_details.plan_collection.period` | SQL Stats Collection | the time until a new logical plan is collected | 5 minutes | |
| 196 | +| `sql.metrics.max_mem_stmt_fingerprints` | SQL Stats Collection | the maximum number of statement fingerprints stored in memory | 100k | |
| 197 | +| `sql.metrics.max_mem_txn_fingerprints` | SQL Stats Collection | the maximum number of transaction fingerprints stored in memory | 100k | |
| 198 | +| `sql.metrics.max_mem_reported_stmt_fingerprints` | SQL Stats Collection | the maximum number of reported statement fingerprints stored in memory | 100k | |
| 199 | +| `sql.metrics.max_mem_reported_txn_fingerprints` | SQL Stats Collection | the maximum number of reported transaction fingerprints stored in memory | 100k | |
| 200 | +| `sql.metrics.max_stmt_fingerprints_per_explicit_txn` | SQL Stats Collection | the maximum number of statement fingerprints stored per explicit transaction | 2k | |
| 201 | +| `sql.metrics.statement_details.index_recommendation_collection.enabled` | SQL Stats Collection | generate an index recommendation for each fingerprint ID | true | |
| 202 | +| `sql.metrics.statement_details.max_mem_reported_idx_recommendations` | SQL Stats Collection | the maximum number of reported index recommendation info stored in memory | 5k | |
| 203 | +| `sql.metrics.statement_details.gateway_node.enabled` | SQL Stats Collection | save the gateway node for each statement fingerprint. If false, the value will be stored as 0 | false | |
| 204 | +| `sql.stats.flush.interval` | Persisted SQL Stats | the interval at which SQL execution statistics are flushed to disk | 10 minutes | |
| 205 | +| `sql.stats.flush.minimum_interval` | Persisted SQL Stats | the minimum interval that SQL stats can be flushed to disk. If a flush operation starts within less than the minimum interval, the flush operation will be aborted | 0 | |
| 206 | +| `sql.stats.flush.force_cleanup.enabled` | Persisted SQL Stats | if set, older SQL stats are discarded periodically when flushing to persisted tables is disabled | false | |
| 207 | +| `sql.stats.flush.enabled` | Persisted SQL Stats | if set, SQL execution statistics are periodically flushed to disk | true | |
| 208 | +| `sql.stats.flush.jitter` | Persisted SQL Stats | jitter fraction on the duration between sql stats flushes | 0.15 | |
| 209 | +| `sql.stats.persisted_rows.max` | Persisted SQL Stats | maximum number of rows of statement and transaction statistics that will be persisted in the system tables | 1M | |
| 210 | +| `sql.stats.cleanup.recurrence` | Persisted SQL Stats | cron-tab recurrence for SQL Stats cleanup job | @hourly | |
| 211 | +| `sql.stats.aggregation.interval` | Persisted SQL Stats | the interval at which we aggregate SQL execution statistics upon flush, this value must be greater than or equal to sql.stats.flush.interval | 1h | |
| 212 | +| `sql.stats.cleanup.rows_to_delete_per_txn` | Persisted SQL Stats | number of rows the compaction job deletes from system table per iteration | 10k | |
| 213 | +| `sql.stats.limit_table_size.enabled` | Persisted SQL Stats | controls whether we allow statement and transaction statistics tables to grow past sql.stats.persisted_rows.max | true | |
| 214 | +| `sql.stats.limit_table_size_check.interval` | Persisted SQL Stats | controls what interval the check is done if the statement and transaction statistics tables have grown past sql.stats.persisted_rows.max | 1h | |
| 215 | +| `sql.stats.activity.flush.enabled` | Top Sql Stats Activity | enable the flush to the system statement and transaction activity tables | true | |
| 216 | +| `sql.stats.activity.top.max` | Top Sql Stats Activity | the limit per column for the top number of statistics to be flushed to the activity tables | 500 | |
| 217 | +| `sql.stats.activity.persisted_rows.max` | Top Sql Stats Activity | maximum number of rows of statement and transaction activity that will be persisted in the system tables | 200k | |
| 218 | +| `sql.stats.response.max` | SQL Stats Endpoint | the maximum number of statements and transaction stats returned in a CombinedStatements request | 20k | |
| 219 | +| `sql.stats.response.show_internal.enabled` | SQL Stats Endpoint | controls if statistics for internal executions should be returned by the CombinedStatements | false | |
| 220 | +| `sql.stats.activity.ui.enabled` | SQL Stats Endpoint | enable the combined statistics endpoint to get data from the system activity tables | true | |
| 221 | + |
| 222 | + |
| 223 | + |
| 224 | + |
| 225 | + |
| 226 | + |
0 commit comments