Add storage.wal.fsync.latency to metrics

rmloveland · rmloveland · commit fb84133c1c5a · 2025-03-10T10:48:38.000-04:00
Fixes DOC-11996
diff --git a/src/current/_includes/v25.1/essential-metrics.md b/src/current/_includes/v25.1/essential-metrics.md
@@ -35,6 +35,7 @@ The **Usage** column explains why each metric is important to visualize in a cus
 | <a id="capacity"></a>capacity                                            | {% if include.deployment == 'self-hosted' %}capacity.total  |{% elsif include.deployment == 'advanced' %}capacity |{% endif %} Total storage capacity                                       | This metric gives total storage capacity. Measurements should comply with the following rule: CockroachDB storage volumes should not be utilized more than 60% (40% free space). |
 | <a id="capacity-available"></a>capacity.available                                  | capacity.available                                           | Available storage capacity                                   | This metric gives available storage capacity. Measurements should comply with the following rule: CockroachDB storage volumes should not be utilized more than 60% (40% free space). |
 | capacity.used                                       | capacity.used                                                | Used storage capacity                                        | This metric gives used storage capacity. Measurements should comply with the following rule: CockroachDB storage volumes should not be utilized more than 60% (40% free space). |
+| <a id="storage-wal-fsync-latency"></a>storage.wal.fsync.latency | {% if include.deployment == 'self-hosted' %}storage.wal.fsync.latency |{% elsif include.deployment == 'advanced' %}storage.wal.fsync.latency |{% endif %} This metric reports the latency of writes to the [WAL]({% link {{ page.version.version }}/architecture/storage-layer.md %}#memtable-and-write-ahead-log). | If this value is greater than `100ms`, it is an indication of a [disk stall]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#disk-stalls). To mitigate the effects of disk stalls, consider deploying your cluster with [WAL failover]({% link {{ page.version.version }}/wal-failover.md %}) configured. |
 | <a id="storage-write-stalls"></a>storage.write-stalls                                | {% if include.deployment == 'self-hosted' %}storage.write.stalls |{% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Number of instances of intentional write stalls to backpressure incoming writes | This metric reports actual disk stall events. Ideally, investigate all reports of disk stalls. As a pratical guideline, one stall per minute is not likely to have a material impact on workload beyond an occasional increase in response time. However one stall per second should be viewed as problematic and investigated actively.  It is particularly problematic if the rate persists over an extended period of time, and worse, if it is increasing. |
 | rocksdb.compactions                                 | rocksdb.compactions.total                                    | Number of SST compactions                                    | This metric reports the number of a node's [LSM compactions]({% link {{ page.version.version }}/common-issues-to-monitor.md %}#lsm-health). If the number of compactions remains elevated while the LSM health does not improve, compactions are not keeping up with the workload. If the condition persists for an extended period, the cluster will initially exhibit performance issues that will eventually escalate into stability issues. |
 | rocksdb.block.cache.hits                            | rocksdb.block.cache.hits                                     | Count of block cache hits                                    | This metric gives hits to block cache which is reserved memory. It is allocated upon the start of a node process by the [`--cache` flag]({% link {{ page.version.version }}/cockroach-start.md %}#general) and never shrinks. By observing block cache hits and misses, you can fine-tune memory allocations in the node process for the demands of the workload. |