Skip to content

DOC-12391 Add link to hotspots doc from other related pages #19463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 24, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/current/_includes/v25.1/essential-alerts.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ A node with a high CPU utilization, an *overloaded* node, has a limited ability

- A persistently high CPU utilization of all nodes in a CockroachDB cluster suggests the current compute resources may be insufficient to support the user workload's concurrency requirements. If confirmed, the number of processors (vCPUs or cores) in the CockroachDB cluster needs to be adjusted to sustain the required level of workload concurrency. For a prompt resolution, either add cluster nodes or throttle the workload concurrency, for example, by reducing the number of concurrent connections to not exceed 4 active statements per vCPU or core.

### Hot node (hot spot)
### Hot node (hotspot)

Unbalanced utilization of CockroachDB nodes in a cluster may negatively affect the cluster's performance and stability, with some nodes getting overloaded while others remain relatively underutilized.

Expand All @@ -38,7 +38,7 @@ Unbalanced utilization of CockroachDB nodes in a cluster may negatively affect t

**Action**

- Refer to [Hot spots]({% link {{ page.version.version }}/performance-recipes.md %}#hot-spots).
- Refer to [Understand hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}).

### Node memory utilization

Expand Down
2 changes: 1 addition & 1 deletion src/current/_includes/v25.1/essential-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ The **Usage** column explains why each metric is important to visualize in a cus
| <a id="ranges-underreplicated"></a>ranges.underreplicated | ranges.underreplicated | Number of ranges with fewer live replicas than the replication target | This metric is an indicator of [replication issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#replication-issues). It shows whether the cluster has data that is not conforming to resilience goals. The next step is to determine the corresponding database object, such as the table or index, of these under-replicated ranges and whether the under-replication is temporarily expected. Use the statement `SELECT table_name, index_name FROM [SHOW RANGES WITH INDEXES] WHERE range_id = {id of under-replicated range};`|
| <a id="ranges-unavailable"></a>ranges.unavailable | ranges.unavailable | Number of ranges with fewer live replicas than needed for quorum | This metric is an indicator of [replication issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#replication-issues). It shows whether the cluster is unhealthy and can impact workload. If an entire range is unavailable, then it will be unable to process queries. |
| queue.replicate.replacedecommissioningreplica.error | {% if include.deployment == 'self-hosted' %}queue.replicate.replacedecommissioningreplica.error.count |{% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Number of failed decommissioning replica replacements processed by the replicate queue | Refer to [Decommission the node]({% link {{ page.version.version }}/node-shutdown.md %}?filters=decommission#decommission-the-node). |
| range.splits | {% if include.deployment == 'self-hosted' %}range.splits.total |{% elsif include.deployment == 'advanced' %}range.splits |{% endif %} Number of range splits | This metric indicates how fast a workload is scaling up. Spikes can indicate resource hot spots since the [split heuristic is based on QPS]({% link {{ page.version.version }}/load-based-splitting.md %}#control-load-based-splitting-threshold). To understand whether hot spots are an issue and with which tables and indexes they are occurring, correlate this metric with other metrics such as CPU usage, such as `sys.cpu.combined.percent-normalized`, or use the [**Hot Ranges** page]({% link {{ page.version.version }}/ui-hot-ranges-page.md %}). |
| range.splits | {% if include.deployment == 'self-hosted' %}range.splits.total |{% elsif include.deployment == 'advanced' %}range.splits |{% endif %} Number of range splits | This metric indicates how fast a workload is scaling up. Spikes can indicate resource [hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}) since the [split heuristic is based on QPS]({% link {{ page.version.version }}/load-based-splitting.md %}#control-load-based-splitting-threshold). To understand whether hotspots are an issue and with which tables and indexes they are occurring, correlate this metric with other metrics such as CPU usage, such as `sys.cpu.combined.percent-normalized`, or use the [**Hot Ranges** page]({% link {{ page.version.version }}/ui-hot-ranges-page.md %}). |
| range.merges | {% if include.deployment == 'self-hosted' %}range.merges.count |{% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Number of range merges | This metric indicates how fast a workload is scaling down. Merges are Cockroach's [optimization for performance](architecture/distribution-layer.html#range-merges). This metric indicates that there have been deletes in the workload. |

## SQL
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
- Benefits of increasing normalization:

- Can improve performance for write-heavy workloads. This is because, with increased normalization, a given business fact must be written to one place rather than to multiple places.
- Allows separate transactions to modify related underlying data without causing [contention](#transaction-contention).
- Allows separate transactions to modify related underlying data without causing [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention).
- Reduces the chance of data inconsistency, since a given business fact must be written only to one place.
- Reduces or eliminates data redundancy.
- Uses less disk space.
Expand All @@ -24,9 +24,9 @@

- If you are working with a table that **must** be indexed on sequential keys, consider using [hash-sharded indexes]({% link {{ page.version.version }}/hash-sharded-indexes.md %}). For details about the mechanics and performance improvements of hash-sharded indexes in CockroachDB, see the blog post [Hash Sharded Indexes Unlock Linear Scaling for Sequential Workloads](https://www.cockroachlabs.com/blog/hash-sharded-indexes-unlock-linear-scaling-for-sequential-workloads/). As part of this, we recommend doing thorough performance testing with and without hash-sharded indexes to see which works best for your application.

- To avoid read hot spots:
- To avoid read hotspots:

- Increase data distribution, which will allow for more ranges. The hot spot exists because the data being accessed is all co-located in one range.
- Increase data distribution, which will allow for more ranges. The hotspot exists because the data being accessed is all co-located in one range.
- Increase [load balancing]({% link {{ page.version.version }}/recommended-production-settings.md %}#load-balancing) across more nodes in the same range. Most transactional reads must go to the leaseholder in CockroachDB, which means that opportunities for load balancing over replicas are minimal.

However, the following features do permit load balancing over replicas:
Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1 @@
We [discourage indexing on sequential keys]({% link {{ page.version.version }}/schema-design-indexes.md %}#best-practices). If a table **must** be indexed on sequential keys, use [hash-sharded indexes]({% link {{ page.version.version }}/hash-sharded-indexes.md %}). Hash-sharded indexes distribute sequential traffic uniformly across ranges, eliminating single-range [hot spots]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#hot-spots) and improving write performance on sequentially-keyed indexes at a small cost to read performance.
We [discourage indexing on sequential keys]({% link {{ page.version.version }}/schema-design-indexes.md %}#best-practices). If a table **must** be indexed on sequential keys, use [hash-sharded indexes]({% link {{ page.version.version }}/hash-sharded-indexes.md %}). Hash-sharded indexes distribute sequential traffic uniformly across ranges, eliminating single-range [hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}) and improving write performance on sequentially-keyed indexes at a small cost to read performance.
4 changes: 2 additions & 2 deletions src/current/_includes/v25.1/sql/range-splits.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ CockroachDB breaks data into ranges. By default, CockroachDB attempts to keep ra

However, there are reasons why you may want to perform manual splits on the ranges that store tables or indexes:

- When a table only consists of a single range, all writes and reads to the table will be served by that range's [leaseholder]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases). If a table only holds a small amount of data but is serving a large amount of traffic, load distribution can become unbalanced and a [hot spot]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#hot-spots) can occur. Splitting the table's ranges manually can allow the load on the table to be more evenly distributed across multiple nodes. For tables consisting of more than a few ranges, load will naturally be distributed across multiple nodes and this will not be a concern.
- When a table only consists of a single range, all writes and reads to the table will be served by that range's [leaseholder]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases). If a table only holds a small amount of data but is serving a large amount of traffic, load distribution can become unbalanced and a [hotspot]({% link {{ page.version.version }}/understand-hotspots.md %}) can occur. Splitting the table's ranges manually can allow the load on the table to be more evenly distributed across multiple nodes. For tables consisting of more than a few ranges, load will naturally be distributed across multiple nodes and this will not be a concern.

- When a table is created, it will only consist of a single range. If you know that a new table will immediately receive significant write traffic, you may want to preemptively split the table based on the expected distribution of writes before applying the load. This can help avoid reduced workload performance that results when automatic splits are unable to keep up with write traffic and a [hot spot]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#hot-spots) occurs.
- When a table is created, it will only consist of a single range. If you know that a new table will immediately receive significant write traffic, you may want to preemptively split the table based on the expected distribution of writes before applying the load. This can help avoid reduced workload performance that results when automatic splits are unable to keep up with write traffic and a [hotspot]({% link {{ page.version.version }}/understand-hotspots.md %}) occurs.
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Cockroach Labs recommends leaving the transaction priority at the default setting in almost all cases. Changing the transaction priority to `HIGH` in particular can lead to difficult-to-debug interactions with other transactions executing on the system.

If you are setting a transaction priority to avoid [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention) or [hot spots]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#hot-spots), or to [get better query performance]({% link {{ page.version.version }}/make-queries-fast.md %}), it is usually a sign that you need to update your [schema design]({% link {{ page.version.version }}/schema-design-database.md %}) and/or review the data access patterns of your workload.
If you are setting a transaction priority to avoid [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention) or [hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}), or to [get better query performance]({% link {{ page.version.version }}/make-queries-fast.md %}), it is usually a sign that you need to update your [schema design]({% link {{ page.version.version }}/schema-design-database.md %}) and/or review the data access patterns of your workload.
Loading
Loading