`alloy-metrics` on `tamarin / staging` is failing to send metrics to mimir #3856

QuantumEnigmaa · 2025-01-30T10:26:43Z

The pod fails to send samples with the following error message :

server returned HTTP status 400 Bad Request: send data to ingesters: failed pushing to ingester mimir-ingester-2: user=anonymous: the sample has been rejected because another sample with a more recent timestamp has already been ingested and this sample is beyond the out-of-order time window of 5m (err-mimir-sample-timestamp-too-old).

One easy workaround would be to increase the out_of_order_time_window field in tamarin's mimir config (currently set to 5m) but the root cause here might be that alloy needs more shard.

We have had multiple such alerts last week: tamarin / production / alloy-metrics - MetricForwardingErrors, on all tamarin clusters (testing, staging, production).

We don't know why these happen, we should investigate and fix.

The text was updated successfully, but these errors were encountered:

QuentinBisson · 2025-01-30T12:59:07Z

I have the same issue also on testing but I would love to know why samples are 20 minutes behind:

ts=2025-01-30T12:46:44.244494161Z caller=grpc_logging.go:76 level=warn method=/cortex.Ingester/Push duration=21.90549ms msg=gRPC err="user=anonymous: the sample has been rejected because another sample with a more recent timestamp has already been ingested and this sample is beyond the out-of-order time window of 5m (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2025-01-30T10:24:46.333Z and is from series statistics_request_count{appid="kube", cluster_id="staging", cluster_type="workload_cluster", container="mtail", customer="panamax", endpoint="mtail", installation="tamarin", instance="10.244.15.145:3903", job="totp-staging/mtail", namespace="totp-staging", organization="panamax", pid="24135", pipeline="stable", pod="tdsrest-6dbd9f77b-44btg", provider="cloud-director", region="onprem", request="TOTPProbe", service_priority="highest"} (sampled 1/10)"

QuentinBisson · 2025-01-30T13:17:29Z

Pod restart fixes it for a while but ...

QuantumEnigmaa added the team/atlas Team Atlas label Jan 30, 2025

github-project-automation bot added this to Roadmap Jan 30, 2025

github-project-automation bot moved this to Inbox 📥 in Roadmap Jan 30, 2025

Rotfuks added the postmortem label Feb 4, 2025

Rotfuks moved this from Inbox 📥 to Up Next ➡️ in Roadmap Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`alloy-metrics` on `tamarin / staging` is failing to send metrics to mimir #3856

`alloy-metrics` on `tamarin / staging` is failing to send metrics to mimir #3856

QuantumEnigmaa commented Jan 30, 2025 •

edited by Rotfuks

Loading

QuentinBisson commented Jan 30, 2025

QuentinBisson commented Jan 30, 2025

alloy-metrics on tamarin / staging is failing to send metrics to mimir #3856

alloy-metrics on tamarin / staging is failing to send metrics to mimir #3856

Comments

QuantumEnigmaa commented Jan 30, 2025 • edited by Rotfuks Loading

QuentinBisson commented Jan 30, 2025

QuentinBisson commented Jan 30, 2025

`alloy-metrics` on `tamarin / staging` is failing to send metrics to mimir #3856

`alloy-metrics` on `tamarin / staging` is failing to send metrics to mimir #3856

QuantumEnigmaa commented Jan 30, 2025 •

edited by Rotfuks

Loading