diff --git a/src/current/_includes/v25.1/essential-alerts.md b/src/current/_includes/v25.1/essential-alerts.md index f71d81dd2f5..dd311c760c6 100644 --- a/src/current/_includes/v25.1/essential-alerts.md +++ b/src/current/_includes/v25.1/essential-alerts.md @@ -24,7 +24,7 @@ A node with a high CPU utilization, an *overloaded* node, has a limited ability - A persistently high CPU utilization of all nodes in a CockroachDB cluster suggests the current compute resources may be insufficient to support the user workload's concurrency requirements. If confirmed, the number of processors (vCPUs or cores) in the CockroachDB cluster needs to be adjusted to sustain the required level of workload concurrency. For a prompt resolution, either add cluster nodes or throttle the workload concurrency, for example, by reducing the number of concurrent connections to not exceed 4 active statements per vCPU or core. -### Hot node (hot spot) +### Hot node (hotspot) Unbalanced utilization of CockroachDB nodes in a cluster may negatively affect the cluster's performance and stability, with some nodes getting overloaded while others remain relatively underutilized. @@ -38,7 +38,7 @@ Unbalanced utilization of CockroachDB nodes in a cluster may negatively affect t **Action** -- Refer to [Hot spots]({% link {{ page.version.version }}/performance-recipes.md %}#hot-spots). +- Refer to [Understand hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}). ### Node memory utilization diff --git a/src/current/_includes/v25.1/essential-metrics.md b/src/current/_includes/v25.1/essential-metrics.md index 41039d31a75..a41c8768d3b 100644 --- a/src/current/_includes/v25.1/essential-metrics.md +++ b/src/current/_includes/v25.1/essential-metrics.md @@ -99,7 +99,7 @@ The **Usage** column explains why each metric is important to visualize in a cus | ranges.underreplicated | ranges.underreplicated | Number of ranges with fewer live replicas than the replication target | This metric is an indicator of [replication issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#replication-issues). It shows whether the cluster has data that is not conforming to resilience goals. The next step is to determine the corresponding database object, such as the table or index, of these under-replicated ranges and whether the under-replication is temporarily expected. Use the statement `SELECT table_name, index_name FROM [SHOW RANGES WITH INDEXES] WHERE range_id = {id of under-replicated range};`| | ranges.unavailable | ranges.unavailable | Number of ranges with fewer live replicas than needed for quorum | This metric is an indicator of [replication issues]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#replication-issues). It shows whether the cluster is unhealthy and can impact workload. If an entire range is unavailable, then it will be unable to process queries. | | queue.replicate.replacedecommissioningreplica.error | {% if include.deployment == 'self-hosted' %}queue.replicate.replacedecommissioningreplica.error.count |{% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Number of failed decommissioning replica replacements processed by the replicate queue | Refer to [Decommission the node]({% link {{ page.version.version }}/node-shutdown.md %}?filters=decommission#decommission-the-node). | -| range.splits | {% if include.deployment == 'self-hosted' %}range.splits.total |{% elsif include.deployment == 'advanced' %}range.splits |{% endif %} Number of range splits | This metric indicates how fast a workload is scaling up. Spikes can indicate resource hot spots since the [split heuristic is based on QPS]({% link {{ page.version.version }}/load-based-splitting.md %}#control-load-based-splitting-threshold). To understand whether hot spots are an issue and with which tables and indexes they are occurring, correlate this metric with other metrics such as CPU usage, such as `sys.cpu.combined.percent-normalized`, or use the [**Hot Ranges** page]({% link {{ page.version.version }}/ui-hot-ranges-page.md %}). | +| range.splits | {% if include.deployment == 'self-hosted' %}range.splits.total |{% elsif include.deployment == 'advanced' %}range.splits |{% endif %} Number of range splits | This metric indicates how fast a workload is scaling up. Spikes can indicate resource [hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}) since the [split heuristic is based on QPS]({% link {{ page.version.version }}/load-based-splitting.md %}#control-load-based-splitting-threshold). To understand whether hotspots are an issue and with which tables and indexes they are occurring, correlate this metric with other metrics such as CPU usage, such as `sys.cpu.combined.percent-normalized`, or use the [**Hot Ranges** page]({% link {{ page.version.version }}/ui-hot-ranges-page.md %}). | | range.merges | {% if include.deployment == 'self-hosted' %}range.merges.count |{% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Number of range merges | This metric indicates how fast a workload is scaling down. Merges are Cockroach's [optimization for performance](architecture/distribution-layer.html#range-merges). This metric indicates that there have been deletes in the workload. | ## SQL diff --git a/src/current/_includes/v25.1/performance/reduce-hot-spots.md b/src/current/_includes/v25.1/performance/reduce-hotspots.md similarity index 92% rename from src/current/_includes/v25.1/performance/reduce-hot-spots.md rename to src/current/_includes/v25.1/performance/reduce-hotspots.md index 4d7b601e33d..799fed761b8 100644 --- a/src/current/_includes/v25.1/performance/reduce-hot-spots.md +++ b/src/current/_includes/v25.1/performance/reduce-hotspots.md @@ -5,7 +5,7 @@ - Benefits of increasing normalization: - Can improve performance for write-heavy workloads. This is because, with increased normalization, a given business fact must be written to one place rather than to multiple places. - - Allows separate transactions to modify related underlying data without causing [contention](#transaction-contention). + - Allows separate transactions to modify related underlying data without causing [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention). - Reduces the chance of data inconsistency, since a given business fact must be written only to one place. - Reduces or eliminates data redundancy. - Uses less disk space. @@ -24,9 +24,9 @@ - If you are working with a table that **must** be indexed on sequential keys, consider using [hash-sharded indexes]({% link {{ page.version.version }}/hash-sharded-indexes.md %}). For details about the mechanics and performance improvements of hash-sharded indexes in CockroachDB, see the blog post [Hash Sharded Indexes Unlock Linear Scaling for Sequential Workloads](https://www.cockroachlabs.com/blog/hash-sharded-indexes-unlock-linear-scaling-for-sequential-workloads/). As part of this, we recommend doing thorough performance testing with and without hash-sharded indexes to see which works best for your application. -- To avoid read hot spots: +- To avoid read hotspots: - - Increase data distribution, which will allow for more ranges. The hot spot exists because the data being accessed is all co-located in one range. + - Increase data distribution, which will allow for more ranges. The hotspot exists because the data being accessed is all co-located in one range. - Increase [load balancing]({% link {{ page.version.version }}/recommended-production-settings.md %}#load-balancing) across more nodes in the same range. Most transactional reads must go to the leaseholder in CockroachDB, which means that opportunities for load balancing over replicas are minimal. However, the following features do permit load balancing over replicas: diff --git a/src/current/_includes/v25.1/performance/use-hash-sharded-indexes.md b/src/current/_includes/v25.1/performance/use-hash-sharded-indexes.md index ca6132d8de6..314b0c24f5f 100644 --- a/src/current/_includes/v25.1/performance/use-hash-sharded-indexes.md +++ b/src/current/_includes/v25.1/performance/use-hash-sharded-indexes.md @@ -1 +1 @@ -We [discourage indexing on sequential keys]({% link {{ page.version.version }}/schema-design-indexes.md %}#best-practices). If a table **must** be indexed on sequential keys, use [hash-sharded indexes]({% link {{ page.version.version }}/hash-sharded-indexes.md %}). Hash-sharded indexes distribute sequential traffic uniformly across ranges, eliminating single-range [hot spots]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#hot-spots) and improving write performance on sequentially-keyed indexes at a small cost to read performance. \ No newline at end of file +We [discourage indexing on sequential keys]({% link {{ page.version.version }}/schema-design-indexes.md %}#best-practices). If a table **must** be indexed on sequential keys, use [hash-sharded indexes]({% link {{ page.version.version }}/hash-sharded-indexes.md %}). Hash-sharded indexes distribute sequential traffic uniformly across ranges, eliminating single-range [hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}) and improving write performance on sequentially-keyed indexes at a small cost to read performance. \ No newline at end of file diff --git a/src/current/_includes/v25.1/sql/range-splits.md b/src/current/_includes/v25.1/sql/range-splits.md index a612774afc0..be16d064f5d 100644 --- a/src/current/_includes/v25.1/sql/range-splits.md +++ b/src/current/_includes/v25.1/sql/range-splits.md @@ -2,6 +2,6 @@ CockroachDB breaks data into ranges. By default, CockroachDB attempts to keep ra However, there are reasons why you may want to perform manual splits on the ranges that store tables or indexes: -- When a table only consists of a single range, all writes and reads to the table will be served by that range's [leaseholder]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases). If a table only holds a small amount of data but is serving a large amount of traffic, load distribution can become unbalanced and a [hot spot]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#hot-spots) can occur. Splitting the table's ranges manually can allow the load on the table to be more evenly distributed across multiple nodes. For tables consisting of more than a few ranges, load will naturally be distributed across multiple nodes and this will not be a concern. +- When a table only consists of a single range, all writes and reads to the table will be served by that range's [leaseholder]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases). If a table only holds a small amount of data but is serving a large amount of traffic, load distribution can become unbalanced and a [hotspot]({% link {{ page.version.version }}/understand-hotspots.md %}) can occur. Splitting the table's ranges manually can allow the load on the table to be more evenly distributed across multiple nodes. For tables consisting of more than a few ranges, load will naturally be distributed across multiple nodes and this will not be a concern. -- When a table is created, it will only consist of a single range. If you know that a new table will immediately receive significant write traffic, you may want to preemptively split the table based on the expected distribution of writes before applying the load. This can help avoid reduced workload performance that results when automatic splits are unable to keep up with write traffic and a [hot spot]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#hot-spots) occurs. +- When a table is created, it will only consist of a single range. If you know that a new table will immediately receive significant write traffic, you may want to preemptively split the table based on the expected distribution of writes before applying the load. This can help avoid reduced workload performance that results when automatic splits are unable to keep up with write traffic and a [hotspot]({% link {{ page.version.version }}/understand-hotspots.md %}) occurs. diff --git a/src/current/_includes/v25.1/sql/use-the-default-transaction-priority.md b/src/current/_includes/v25.1/sql/use-the-default-transaction-priority.md index f98742dd7c7..0a98718ff14 100644 --- a/src/current/_includes/v25.1/sql/use-the-default-transaction-priority.md +++ b/src/current/_includes/v25.1/sql/use-the-default-transaction-priority.md @@ -1,3 +1,3 @@ Cockroach Labs recommends leaving the transaction priority at the default setting in almost all cases. Changing the transaction priority to `HIGH` in particular can lead to difficult-to-debug interactions with other transactions executing on the system. -If you are setting a transaction priority to avoid [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention) or [hot spots]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#hot-spots), or to [get better query performance]({% link {{ page.version.version }}/make-queries-fast.md %}), it is usually a sign that you need to update your [schema design]({% link {{ page.version.version }}/schema-design-database.md %}) and/or review the data access patterns of your workload. +If you are setting a transaction priority to avoid [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention) or [hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}), or to [get better query performance]({% link {{ page.version.version }}/make-queries-fast.md %}), it is usually a sign that you need to update your [schema design]({% link {{ page.version.version }}/schema-design-database.md %}) and/or review the data access patterns of your workload. diff --git a/src/current/_includes/v25.1/zone-configs/variables.md b/src/current/_includes/v25.1/zone-configs/variables.md index 37008425787..4fffa6198f0 100644 --- a/src/current/_includes/v25.1/zone-configs/variables.md +++ b/src/current/_includes/v25.1/zone-configs/variables.md @@ -2,7 +2,7 @@ Variable | Description ------|------------ `range_min_bytes` | The minimum size, in bytes, for a range of data in the zone. When a range is less than this size, CockroachDB will merge it with an adjacent range.

**Default:** `134217728` (128 MiB) `range_max_bytes` | The maximum size, in bytes, for a [range]({{link_prefix}}architecture/glossary.html#architecture-range) of data in the zone. When a range reaches this size, CockroachDB will [split it]({{link_prefix}}architecture/distribution-layer.html#range-splits) into two ranges.

**Default:** `536870912` (512 MiB) -`gc.ttlseconds` | The number of seconds overwritten [MVCC values]({{link_prefix}}architecture/storage-layer.html#mvcc) will be retained before [garbage collection]({{link_prefix}}architecture/storage-layer.html#garbage-collection).

**Default:** `14400` (4 hours)

Smaller values can save disk space and improve performance if values are frequently overwritten or for queue-like workloads. The smallest value we regularly test is `600` (10 minutes); smaller values are unlikely to be beneficial because of the frequency with which GC runs. If you use [non-scheduled incremental backups](take-full-and-incremental-backups.html#garbage-collection-and-backups), the GC TTL must be greater than the interval between incremental backups. Otherwise, your incremental backups will fail with [the error message `protected ts verification error`](common-errors.html#protected-ts-verification-error). To avoid this problem, we recommend using [scheduled backups](create-schedule-for-backup.html) instead, which automatically [use protected timestamps](create-schedule-for-backup.html#protected-timestamps-and-scheduled-backups) to ensure they succeed.

Larger values increase the interval allowed for [`AS OF SYSTEM TIME`](as-of-system-time.html) queries and allow for less frequent incremental backups. The largest value we regularly test is `90000` (25 hours). Increasing the GC TTL is not meant to be a solution for long-term retention of history; for that you should handle versioning in the [schema design at the application layer](schema-design-overview.html). Setting the GC TTL too high can cause problems if the retained versions of a single row approach the [maximum range size](#range-max-bytes). This is important because all versions of a row are stored in a single range that never [splits](architecture/distribution-layer.html#range-splits).

+`gc.ttlseconds` | The number of seconds overwritten [MVCC values]({{link_prefix}}architecture/storage-layer.html#mvcc) will be retained before [garbage collection]({{link_prefix}}architecture/storage-layer.html#garbage-collection).

**Default:** `14400` (4 hours)

Smaller values can save disk space and improve performance if values are frequently overwritten or for [queue-like workloads]({{link_prefix}}understand-hotspots.html#queueing-hotspot). The smallest value we regularly test is `600` (10 minutes); smaller values are unlikely to be beneficial because of the frequency with which GC runs. If you use [non-scheduled incremental backups](take-full-and-incremental-backups.html#garbage-collection-and-backups), the GC TTL must be greater than the interval between incremental backups. Otherwise, your incremental backups will fail with [the error message `protected ts verification error`](common-errors.html#protected-ts-verification-error). To avoid this problem, we recommend using [scheduled backups](create-schedule-for-backup.html) instead, which automatically [use protected timestamps](create-schedule-for-backup.html#protected-timestamps-and-scheduled-backups) to ensure they succeed.

Larger values increase the interval allowed for [`AS OF SYSTEM TIME`](as-of-system-time.html) queries and allow for less frequent incremental backups. The largest value we regularly test is `90000` (25 hours). Increasing the GC TTL is not meant to be a solution for long-term retention of history; for that you should handle versioning in the [schema design at the application layer](schema-design-overview.html). Setting the GC TTL too high can cause problems if the retained versions of a single row approach the [maximum range size](#range-max-bytes). This is important because all versions of a row are stored in a single range that never [splits](architecture/distribution-layer.html#range-splits). `num_replicas` | The number of replicas in the zone, also called the "replication factor".

**Default:** `3`

For the `system` database and `.meta`, `.liveness`, and `.system` ranges, the default value is `5`.

For [multi-region databases configured to survive region failures]({% link {{ page.version.version }}/multiregion-survival-goals.md %}#survive-region-failures), the default value is `5`; this will include both [voting](#num_voters) and [non-voting replicas]({% link {{ page.version.version }}/architecture/replication-layer.md %}#non-voting-replicas). `constraints` | An array of required (`+`) and/or prohibited (`-`) constraints influencing the location of replicas. See [Types of Constraints]({% link {{ page.version.version }}/configure-replication-zones.md %}#types-of-constraints) and [Scope of Constraints]({% link {{ page.version.version }}/configure-replication-zones.md %}#scope-of-constraints) for more details.

To prevent hard-to-detect typos, constraints placed on [store attributes and node localities]({% link {{ page.version.version }}/configure-replication-zones.md %}#descriptive-attributes-assigned-to-nodes) must match the values passed to at least one node in the cluster. If not, an error is signalled. To prevent this error, make sure at least one active node is configured to match the constraint. For example, apply `constraints = '[+region=west]'` only if you had set `--locality=region=west` for at least one node while starting the cluster.

**Default:** No constraints, with CockroachDB locating each replica on a unique node and attempting to spread replicas evenly across localities. `lease_preferences` | An ordered list of required and/or prohibited constraints influencing the location of [leaseholders]({% link {{ page.version.version }}/architecture/glossary.md %}#architecture-leaseholder). Whether each constraint is required or prohibited is expressed with a leading `+` or `-`, respectively. Note that lease preference constraints do not have to be shared with the `constraints` field. For example, it's valid for your configuration to define a `lease_preferences` field that does not reference any values from the `constraints` field. It's also valid to define a `lease_preferences` field with no `constraints` field at all.

If the first preference cannot be satisfied, CockroachDB will attempt to satisfy the second preference, and so on. If none of the preferences can be met, the lease will be placed using the default lease placement algorithm, which is to base lease placement decisions on how many leases each node already has, trying to make all the nodes have around the same amount.

Each value in the list can include multiple constraints. For example, the list `[[+zone=us-east-1b, +ssd], [+zone=us-east-1a], [+zone=us-east-1c, +ssd]]` means "prefer nodes with an SSD in `us-east-1b`, then any nodes in `us-east-1a`, then nodes in `us-east-1c` with an SSD."

For a usage example, see [Constrain leaseholders to specific availability zones]({% link {{ page.version.version }}/configure-replication-zones.md %}#constrain-leaseholders-to-specific-availability-zones).

**Default**: No lease location preferences are applied if this field is not specified. diff --git a/src/current/v25.1/admission-control.md b/src/current/v25.1/admission-control.md index 28610579ec4..9e138bae6c4 100644 --- a/src/current/v25.1/admission-control.md +++ b/src/current/v25.1/admission-control.md @@ -18,7 +18,7 @@ For CPU, different types of usage are queued differently based on priority to al For storage IO, the goal is to prevent the [storage layer's log-structured merge tree]({% link {{ page.version.version }}/architecture/storage-layer.md %}#log-structured-merge-trees) (LSM) from experiencing high [read amplification]({% link {{ page.version.version }}/architecture/storage-layer.md %}#read-amplification), which slows down reads, while also maintaining the ability to absorb bursts of writes. -Admission control works on a per-[node]({% link {{ page.version.version }}/architecture/overview.md %}#node) basis, since even though a large CockroachDB cluster may be well-provisioned as a whole, individual nodes are stateful and may experience performance [hot spots]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#hot-spots). +Admission control works on a per-[node]({% link {{ page.version.version }}/architecture/overview.md %}#node) basis, since even though a large CockroachDB cluster may be well-provisioned as a whole, individual nodes are stateful and may experience performance [hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}). For more details about how the admission control system works, see: @@ -27,7 +27,7 @@ For more details about how the admission control system works, see: ## Use cases for admission control -A well-provisioned CockroachDB cluster may still encounter performance bottlenecks at the node level, as stateful nodes can develop [hot spots]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#hot-spots) that last until the cluster rebalances itself. When hot spots occur, they should not cause failures or degraded performance for important work. +A well-provisioned CockroachDB cluster may still encounter performance bottlenecks at the node level, as stateful nodes can develop [hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}) that last until the cluster rebalances itself. When hotspots occur, they should not cause failures or degraded performance for important work. This is particularly important for CockroachDB {{ site.data.products.standard }} and CockroachDB {{ site.data.products.basic }}, where one user tenant cluster experiencing high load should not degrade the performance or availability of a different, isolated tenant cluster running on the same host. diff --git a/src/current/v25.1/common-issues-to-monitor.md b/src/current/v25.1/common-issues-to-monitor.md index c0cb43c74e7..60595f5c993 100644 --- a/src/current/v25.1/common-issues-to-monitor.md +++ b/src/current/v25.1/common-issues-to-monitor.md @@ -27,7 +27,7 @@ Provision enough CPU to support your operational and workload concurrency requir | Category | Recommendations | |----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| CPU | | +| CPU | | ### CPU monitoring diff --git a/src/current/v25.1/hash-sharded-indexes.md b/src/current/v25.1/hash-sharded-indexes.md index b1ab5c204f9..9ee74d592a8 100644 --- a/src/current/v25.1/hash-sharded-indexes.md +++ b/src/current/v25.1/hash-sharded-indexes.md @@ -1,11 +1,11 @@ --- title: Index Sequential Keys with Hash-sharded Indexes -summary: Hash-sharded indexes can eliminate single-range hot spots and improve write performance on sequentially-keyed indexes at a small cost to read performance +summary: Hash-sharded indexes can eliminate single-range hotspots and improve write performance on sequentially-keyed indexes at a small cost to read performance toc: true docs_area: develop --- -If you are working with a table that must be indexed on sequential keys, you should use **hash-sharded indexes**. Hash-sharded indexes distribute sequential traffic uniformly across ranges, eliminating single-range hot spots and improving write performance on sequentially-keyed indexes at a small cost to read performance. +If you are working with a table that must be indexed on sequential keys, you should use **hash-sharded indexes**. Hash-sharded indexes distribute sequential traffic uniformly across ranges, eliminating single-range hotspots and improving write performance on sequentially-keyed indexes at a small cost to read performance. {{site.data.alerts.callout_info}} Hash-sharded indexes are an implementation of hash partitioning, not hash indexing. @@ -15,7 +15,7 @@ Hash-sharded indexes are an implementation of hash partitioning, not hash indexi ### Overview -CockroachDB automatically splits ranges of data in [the key-value store]({% link {{ page.version.version }}/architecture/storage-layer.md %}) based on [the size of the range]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#range-splits) and on [the load streaming to the range]({% link {{ page.version.version }}/load-based-splitting.md %}). To split a range based on load, the system looks for a point in the range that evenly divides incoming traffic. If the range is indexed on a column of data that is sequential in nature (e.g., [an ordered sequence]({% link {{ page.version.version }}/sql-faqs.md %}#what-are-the-differences-between-uuid-sequences-and-unique_rowid) or a series of increasing, non-repeating [`TIMESTAMP`s](timestamp.html)), then all incoming writes to the range will be the last (or first) item in the index and appended to the end of the range. As a result, the system cannot find a point in the range that evenly divides the traffic, and the range cannot benefit from load-based splitting, creating a [hot spot](performance-best-practices-overview.html#hot-spots) on the single range. +CockroachDB automatically splits ranges of data in [the key-value store]({% link {{ page.version.version }}/architecture/storage-layer.md %}) based on [the size of the range]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#range-splits) and on [the load streaming to the range]({% link {{ page.version.version }}/load-based-splitting.md %}). To split a range based on load, the system looks for a point in the range that evenly divides incoming traffic. If the range is indexed on a column of data that is sequential in nature (e.g., [an ordered sequence]({% link {{ page.version.version }}/sql-faqs.md %}#what-are-the-differences-between-uuid-sequences-and-unique_rowid) or a series of increasing, non-repeating [`TIMESTAMP`s](timestamp.html)), then all incoming writes to the range will be the last (or first) item in the index and appended to the end of the range. As a result, the system cannot find a point in the range that evenly divides the traffic, and the range cannot benefit from load-based splitting, creating a [hotspot]({% link {{ page.version.version }}/understand-hotspots.md %}) on the single range. Hash-sharded indexes solve this problem by distributing sequential data across multiple nodes within your cluster, eliminating hotspots. The trade-off to this, however, is a small performance impact on reading sequential data or ranges of data, as it's not guaranteed that sequentially close values will be on the same node. diff --git a/src/current/v25.1/load-based-splitting.md b/src/current/v25.1/load-based-splitting.md index 58e1e4903cc..cd8e999a409 100644 --- a/src/current/v25.1/load-based-splitting.md +++ b/src/current/v25.1/load-based-splitting.md @@ -87,7 +87,7 @@ You can see how often a split key cannot be found over time by looking at the fo This metric is directly related to the log message described above. -For more information about how to reduce hot spots (including hot ranges) on your cluster, see [Hot spots]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#hot-spots). +For more information about how to reduce hotspots (including hot ranges) on your cluster, refer to [Understand hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}). ## See also diff --git a/src/current/v25.1/make-queries-fast.md b/src/current/v25.1/make-queries-fast.md index 06bbe88dfea..183056944a1 100644 --- a/src/current/v25.1/make-queries-fast.md +++ b/src/current/v25.1/make-queries-fast.md @@ -8,7 +8,7 @@ docs_area: develop This page provides an overview for optimizing statement performance in CockroachDB. To get good performance, you need to look at how you're accessing the database through several lenses: - [SQL statement performance](#sql-statement-performance-rules): This is the most common cause of performance problems and where you should start. -- [Schema design](#schema-design): Depending on your SQL schema and the data access patterns of your workload, you may need to make changes to avoid creating [transaction contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention) or [hot spots]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#hot-spots). +- [Schema design](#schema-design): Depending on your SQL schema and the data access patterns of your workload, you may need to make changes to avoid creating [transaction contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention) or [hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}). - [Cluster topology](#cluster-topology): As a distributed system, CockroachDB requires you to trade off latency vs. resiliency. This requires choosing the right cluster topology for your needs. ## SQL statement performance rules @@ -28,7 +28,7 @@ For an example of applying the rules to a query, see [Apply SQL Statement Perfor If you are following the instructions in [the SQL performance section](#sql-statement-performance-rules) and still not getting the performance you want, you may need to look at your schema design and data access patterns to make sure that you are not: - Introducing transaction contention. For methods for diagnosing and mitigating transaction contention, see [Transaction contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention). -- Creating hot spots in your cluster. For methods for detecting and eliminating hot spots, see [hot spots]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#hot-spots). +- Creating hotspots in your cluster. For methods for detecting and eliminating hotspots, refer to [Understand hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}#). ## Cluster topology diff --git a/src/current/v25.1/pagination.md b/src/current/v25.1/pagination.md index 8871e27c91c..893b5c3c54e 100644 --- a/src/current/v25.1/pagination.md +++ b/src/current/v25.1/pagination.md @@ -208,7 +208,7 @@ Time: 1ms total (execution 1ms / network 0ms) As shown by the `estimated row count` row, this query scans only 25 rows, far fewer than the 200049 scanned by the `LIMIT`/`OFFSET` query. {{site.data.alerts.callout_danger}} -Using a sequential (i.e., non-[UUID]({% link {{ page.version.version }}/uuid.md %})) primary key creates hot spots in the database for write-heavy workloads, since concurrent [`INSERT`]({% link {{ page.version.version }}/insert.md %})s to the table will attempt to write to the same (or nearby) underlying [ranges]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-range). This can be mitigated by designing your schema with [multi-column primary keys which include a monotonically increasing column]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#use-multi-column-primary-keys). +Using a sequential (i.e., non-[UUID]({% link {{ page.version.version }}/uuid.md %})) primary key creates hotspots in the database for write-heavy workloads, since concurrent [`INSERT`]({% link {{ page.version.version }}/insert.md %})s to the table will attempt to write to the same (or nearby) underlying [ranges]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-range). This can be mitigated by designing your schema with [multi-column primary keys which include a monotonically increasing column]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#use-multi-column-primary-keys). {{site.data.alerts.end}} ## Differences between keyset pagination and cursors diff --git a/src/current/v25.1/performance-best-practices-overview.md b/src/current/v25.1/performance-best-practices-overview.md index cf258986a49..1b88e654f6f 100644 --- a/src/current/v25.1/performance-best-practices-overview.md +++ b/src/current/v25.1/performance-best-practices-overview.md @@ -69,7 +69,7 @@ The best practices for generating unique IDs in a distributed database like Cock 1. Using the [`SERIAL`]({% link {{ page.version.version }}/serial.md %}) pseudo-type for a column to generate random unique IDs. This can result in a performance bottleneck because IDs generated temporally near each other have similar values and are located physically near each other in a table's storage. 1. Generating monotonically increasing [`INT`]({% link {{ page.version.version }}/int.md %}) IDs by using transactions with roundtrip [`SELECT`]({% link {{ page.version.version }}/select-clause.md %})s, e.g., `INSERT INTO tbl (id, …) VALUES ((SELECT max(id)+1 FROM tbl), …)`. This has a **very high performance cost** since it makes all [`INSERT`]({% link {{ page.version.version }}/insert.md %}) transactions wait for their turn to insert the next ID. You should only do this if your application really does require strict ID ordering. In some cases, using [change data capture (CDC)]({% link {{ page.version.version }}/change-data-capture-overview.md %}) can help avoid the requirement for strict ID ordering. If you can avoid the requirement for strict ID ordering, you can use one of the higher-performance ID strategies outlined in the following sections. -The preceding approaches are likely to create [hot spots](#hot-spots) for both reads and writes in CockroachDB. {% include {{page.version.version}}/performance/use-hash-sharded-indexes.md %} +The preceding approaches are likely to create [hotspots](#hotspots) for both reads and writes in CockroachDB. {% include {{page.version.version}}/performance/use-hash-sharded-indexes.md %} To create unique and non-sequential IDs, we recommend the following approaches (listed in order from best to worst performance): @@ -83,7 +83,7 @@ To create unique and non-sequential IDs, we recommend the following approaches ( A well-designed multi-column primary key can yield even better performance than a [UUID primary key](#use-functions-to-generate-unique-ids), but it requires more up-front schema design work. To get the best performance, ensure that any monotonically increasing field is located **after** the first column of the primary key. When done right, such a composite primary key should result in: -- Enough randomness in your primary key to spread the table data / query load relatively evenly across the cluster, which will avoid hot spots. By "enough randomness" we mean that the prefix of the primary key should be relatively uniformly distributed over its domain. Its domain should have at least as many elements as you have nodes. +- Enough randomness in your primary key to spread the table data / query load relatively evenly across the cluster, which will avoid hotspots. By "enough randomness" we mean that the prefix of the primary key should be relatively uniformly distributed over its domain. Its domain should have at least as many elements as you have nodes. - A monotonically increasing column that is part of the primary key (and thus indexed) which is also useful in your queries. For example, consider a social media website. Social media posts are written by users, and on login the user's last 10 posts are displayed. A good choice for a primary key might be `(username, post_timestamp)`. For example: @@ -343,9 +343,9 @@ By default under [`SERIALIZABLE`]({% link {{ page.version.version }}/demo-serial - [Delays in query completion]({% link {{ page.version.version }}/query-behavior-troubleshooting.md %}#hanging-or-stuck-queries). This occurs when multiple transactions are trying to write to the same "locked" data at the same time, making a transaction unable to complete. This is also known as *lock contention*. - [Transaction retries]({% link {{ page.version.version }}/transactions.md %}#automatic-retries) performed automatically by CockroachDB. This occurs if a transaction cannot be placed into a serializable ordering among all of the currently-executing transactions. This is also called a *serialization conflict*. - [Transaction retry errors]({% link {{ page.version.version }}/transaction-retry-error-reference.md %}), which are emitted to your client when an automatic retry is not possible or fails. Under `SERIALIZABLE` isolation, your application must address transaction retry errors with [client-side retry handling]({% link {{ page.version.version }}/transaction-retry-error-reference.md %}#client-side-retry-handling). -- [Cluster hot spots](#hot-spots). +- [Cluster hotspots](#hotspots). -To mitigate these effects, [reduce the causes of transaction contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#reduce-transaction-contention) and [reduce hot spots](#reduce-hot-spots). For further background on transaction contention, see [What is Database Contention, and Why Should You Care?](https://www.cockroachlabs.com/blog/what-is-database-contention/). +To mitigate these effects, [reduce the causes of transaction contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#reduce-transaction-contention) and [reduce hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}#reduce-hotspots). For further background on transaction contention, see [What is Database Contention, and Why Should You Care?](https://www.cockroachlabs.com/blog/what-is-database-contention/). ### Reduce transaction contention @@ -361,24 +361,11 @@ To maximize transaction performance, you'll need to maximize the performance of - Use the fastest [storage devices]({% link {{ page.version.version }}/recommended-production-settings.md %}#storage) available. - If the contending transactions operate on different keys within the same range, add [more CPU power (more cores) per node]({% link {{ page.version.version }}/recommended-production-settings.md %}#sizing). However, if the transactions all operate on the same key, this may not provide an improvement. -## Hot spots +## Hotspots -A *hot spot* is any location on the cluster receiving significantly more requests than another. Hot spots are a symptom of *resource contention* and can create problems as requests increase, including excessive [transaction contention](#transaction-contention). +A *hotspot* is any location on the cluster receiving significantly more requests than another. Hotspots are a symptom of *resource contention* and can create problems as requests increase, including excessive [transaction contention](#transaction-contention). -[Hot spots occur]({% link {{ page.version.version }}/performance-recipes.md %}#indicators-that-your-cluster-has-hot-spots) when an imbalanced workload access pattern causes significantly more reads and writes on a subset of data. For example: - -- Transactions operate on the **same range but different index keys**. These operations are limited by the overall hardware capacity of [the range leaseholder]({% link {{ page.version.version }}/architecture/overview.md %}#cockroachdb-architecture-terms) node. -- A range is indexed on a column of data that is sequential in nature (e.g., [an ordered sequence]({% link {{ page.version.version }}/sql-faqs.md %}#what-are-the-differences-between-uuid-sequences-and-unique_rowid), or a series of increasing, non-repeating [`TIMESTAMP`s]({% link {{ page.version.version }}/timestamp.md %})), such that all incoming writes to the range will be the last (or first) item in the index and appended to the end of the range. Because the system is unable to find a split point in the range that evenly divides the traffic, the range cannot benefit from [load-based splitting]({% link {{ page.version.version }}/load-based-splitting.md %}). This creates a hot spot at the single range. - -Read hot spots can occur if you perform lots of scans of a portion of a table index or a single key. - -### Reduce hot spots - -{% include {{ page.version.version }}/performance/reduce-hot-spots.md %} - -For a demo on hot spot reduction, watch the following video: - -{% include_cached youtube.html video_id="j15k01NeNNA" %} +For a detailed explanation of hotspot causes and mitigation strategies, refer to [Understand Hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}). ## See also diff --git a/src/current/v25.1/performance-recipes.md b/src/current/v25.1/performance-recipes.md index bd6b083671c..8666e2b6d53 100644 --- a/src/current/v25.1/performance-recipes.md +++ b/src/current/v25.1/performance-recipes.md @@ -42,8 +42,8 @@ This section describes how to use CockroachDB commands and dashboards to identif
  • The Hot Ranges page (DB Console) displays a higher-than-expected QPS for a range.
  • The Key Visualizer (DB Console) shows ranges with much higher-than-average write rates for the cluster.
  • - - + +