Skip to content

Commit 94cbfe1

Browse files
craig[bot]maryliag
andcommitted
Merge #118694
118694: technotes: index recommendations r=maryliag a=maryliag Tech notes on index recommendations. Part Of CRDB-35839 Release note: None Co-authored-by: maryliag <[email protected]>
2 parents 6ba0e6c + c05fa67 commit 94cbfe1

File tree

1 file changed

+97
-0
lines changed

1 file changed

+97
-0
lines changed
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Index recommendations on SQL Statistics
2+
Last Update: February 2024
3+
4+
Original author: maryliag
5+
6+
This document provides an overview of how Index Recommendations are
7+
used for statement statistics. How frequent they're generated and how they're
8+
stored. This document doesn't focus on the generation itself. For more information
9+
on the index recommendation generation, check its [RFC](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20211112_index_recommendation.md).
10+
11+
Table of contents:
12+
13+
- [Overview](#overview)
14+
- [Cache](#cache)
15+
- [Cluster Settings](#cluster-settings)
16+
- [Console](#console)
17+
18+
## Overview
19+
When a statement is executed, we want to calculate index recommendations for it, in
20+
case there are better indexes that can improve its execution. Generating an index
21+
recommendation is a costly operation, so we don't want to generate it on each execution.
22+
At the same time, generating frequent index recommendations for the same plan would
23+
likely generate the same results (unless there is a big increase/decrease in rows affected
24+
by the query).
25+
With this in mind, we use a cache to store the latest index recommendation for the most
26+
recent statement fingerprints, avoiding a new generation on each execution.
27+
28+
The value of the recommendation can be found on the column `index_recommendation`
29+
as a `STRING[] NOT NULL` on:
30+
- `system.statement_statistics`
31+
- `crdb_internal.statement_statistics`
32+
- `crdb_internal_statement_statistics_persisted`
33+
34+
When recording a statement, it calls the function `ShouldGenerateIndexRecommendation`.
35+
This function returns true if there was no index recommendation generated for the
36+
statement fingerprint in the past hour and there was at least 5 executions
37+
(`minExecCount`) of it.
38+
We don't generate index recommendations for statement fingerprints with 5 or fewer
39+
executions because we don't want to perform a heavy operation[^1] for a statement that
40+
is barely executed.
41+
42+
Index recommendations for the SQL Statistics system are only generated for DML
43+
statements that are not internal.
44+
45+
It then calls `UpdateIndexRecommendations`. This function can make two types of updates:
46+
1. A new index recommendation was generated: it updates the value on the cache and
47+
reset the last update time and the execution count.
48+
2. No index recommendation was generated: increase the counter of execution count. The
49+
counter is only increased if less than 5, since that is the count we care about. If the
50+
value is already greater than 5, no need to keep updating.
51+
52+
If a recommendation is generated for a new fingerprint, and we reached the limit
53+
on the cache of how much we can store, it will try to remove any entries that have
54+
the last update value older than 24hrs (`timeThresholdForDeletion`). It also needs
55+
to be at least 5 minutes (`timeBetweenCleanups`) between cleanups (to avoid cases where
56+
a lot of new fingerprints are created and could cause contention on the cache). If the
57+
limit has reached and has been less the `timeBetweenCleanups` or no data older than
58+
24hrs can be deleted, new entries won't be added to the cache.
59+
60+
## Cache
61+
The Index Recommendation cache is a mutex map with corresponding info:
62+
- Key (indexRecKey): statement fingerprint, database, plan hash
63+
- Value (indexRecInfo): last generated timestamp, recommendations, execution count
64+
65+
## Cluster Settings
66+
There are 2 cluster settings controlling this system:
67+
- `sql.metrics.statement_details.index_recommendation_collection.enabled`:
68+
enable/disable if the system will generate index recommendations. This is a safety
69+
measure in case there is a performance degradation on this feature, and it can be
70+
disabled. Default value is `true`.
71+
- `sql.metrics.statement_details.max_mem_reported_idx_recommendations`:
72+
defines the maximum number of reported index recommendations info we store in the
73+
cache defined previously. Default value is `5000`.
74+
75+
## Console
76+
This information can be seen in a few different places:
77+
- Statement Details page (Explain Plan tab): on the table on the page, there is a
78+
column for `Insights`. If a row shows that there are insights for that particular
79+
plan, when clicking on it, it will display the recommendation.
80+
- Insights (Workload): Any Insights with type `Suboptimal Plan` can be selected and
81+
the index recommendation will be displayed on its details page.
82+
- Insights (Schema): It will list the latest index recommendation per fingerprint.
83+
84+
All options above will display a button to create/alter/drop the index directly
85+
from the Console UI.
86+
87+
[^1]: The cost of generating index recommendations is highly variable. Index recommendations are generated by:
88+
1. Analyzing the query to find "hypothetical indexes" that may improve the performance of the query.
89+
2. Running the query optimizer as if the hypothetical indexes actually exist.
90+
3. Any hypothetical index in the final query plan becomes an index recommendation.
91+
92+
Step 1 is not picky. Its goal is to cover all possible indexes that might help.
93+
In general, the number of hypothetical indexes grows with the number of filtered columns
94+
in the query.
95+
Step 2 can be fast for simple queries (<1ms), but the cost of optimization grows with
96+
the number of joins, filters, and columns (>1s).
97+
If Step 1 adds many hypothetical indexes, Step 2 will take longer because there are more query plans to explore.

0 commit comments

Comments
 (0)