Skip to content

Latest commit

 

History

History
110 lines (80 loc) · 5.74 KB

fix-watermark-errors.md

File metadata and controls

110 lines (80 loc) · 5.74 KB
applies_to navigation_title mapped_pages
stack deployment
eck ess ece self
Watermark errors

Watermark errors [fix-watermark-errors]

When a data node is critically low on disk space and has reached the flood-stage disk usage watermark, the following error is logged: Error: disk usage exceeded flood-stage watermark, index has read-only-allow-delete block.

To prevent a full disk, when a node reaches this watermark, {{es}} blocks writes to any index with a shard on the node. If the block affects related system indices, {{kib}} and other {{stack}} features may become unavailable. For example, this could induce {{kib}}'s Kibana Server is not Ready yet error message.

{{es}} will automatically remove the write block when the affected node’s disk usage falls below the high disk watermark. To achieve this, {{es}} attempts to rebalance some of the affected node’s shards to other nodes in the same data tier.

::::{tip} If you're using {{ech}}, you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, and real-time issue detection with resolution paths. For more information, refer to . ::::

Monitor rebalancing [fix-watermark-errors-rebalance]

To verify that shards are moving off the affected node until it falls below high watermark., use the cat shards API and cat recovery API:

GET _cat/shards?v=true

GET _cat/recovery?v=true&active_only=true

If shards remain on the node keeping it about high watermark, use the cluster allocation explanation API to get an explanation for their allocation status.

GET _cluster/allocation/explain
{
  "index": "my-index",
  "shard": 0,
  "primary": false
}

Temporary Relief [fix-watermark-errors-temporary]

To immediately restore write operations, you can temporarily increase disk watermarks and remove the write block.

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.low.max_headroom": "100GB",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.routing.allocation.disk.watermark.high.max_headroom": "20GB",
    "cluster.routing.allocation.disk.watermark.flood_stage": "97%",
    "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": "5GB",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen": "97%",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": "5GB"
  }
}

PUT */_settings?expand_wildcards=all
{
  "index.blocks.read_only_allow_delete": null
}

When a long-term solution is in place, to reset or reconfigure the disk watermarks:

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": null,
    "cluster.routing.allocation.disk.watermark.low.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.high": null,
    "cluster.routing.allocation.disk.watermark.high.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.flood_stage": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": null
  }
}

Resolve [fix-watermark-errors-resolve]

To resolve watermark errors permanently, perform one of the following actions:

  • Horizontally scale nodes of the affected data tiers.
  • Vertically scale existing nodes to increase disk space.
  • Delete indices using the delete index API, either permanently if the index isn’t needed, or temporarily to later restore.
  • update related ILM policy to push indices through to later data tiers

::::{tip} On {{ech}} and {{ece}}, indices may need to be temporarily deleted via its Elasticsearch API Console to later snapshot restore in order to resolve cluster health status:red which will block attempted changes. If you experience issues with this resolution flow on {{ech}}, kindly reach out to Elastic Support for assistance. ::::