You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
notifier: fix increment of metric prometheus_notifications_errors_total
Previously, prometheus_notifications_errors_total was incremented by
one whenever a batch of alerts was affected by an error during sending
to a specific alertmanager. However, the corresponding metric
prometheus_notifications_sent_total, counting all alerts that were
sent (including those where the sent ended in error), is incremented
by the batch size, i.e. the number of alerts.
Therefore, the ratio used in the mixin for the
PrometheusErrorSendingAlertsToSomeAlertmanagers alert is inconsistent.
This commit changes the increment of
prometheus_notifications_errors_total to the number of alerts that
were sent in the attempt that ended in an error. It also adjusts the
metrics help string accordingly and makes the wording in the alert in
the mixin more precise.
Signed-off-by: beorn7 <[email protected]>
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,7 @@
2
2
3
3
## unreleased
4
4
5
+
*[CHANGE] Notifier: Increment the prometheus_notifications_errors_total metric by the number of affected alerts rather than by one per batch of affected alerts. #15428
5
6
*[ENHANCEMENT] OTLP receiver: Convert also metric metadata. #15416
Copy file name to clipboardExpand all lines: documentation/prometheus-mixin/alerts.libsonnet
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -84,8 +84,8 @@
84
84
severity:'warning',
85
85
},
86
86
annotations: {
87
-
summary:'Prometheus has encountered more than 1% errors sending alerts to a specific Alertmanager.',
88
-
description:'{{ printf "%%.1f" $value }}%% errors while sending alerts from Prometheus %(prometheusName)s to Alertmanager {{$labels.alertmanager}}.' % $._config,
87
+
summary:'More than 1% of alerts sent by Prometheus to a specific Alertmanager were affected by errors.',
88
+
description:'{{ printf "%%.1f" $value }}%% of alerts sent by Prometheus %(prometheusName)s to Alertmanager {{$labels.alertmanager}} were affected by errors.' % $._config,
0 commit comments