Skip to content

Commit a2ffcdd

Browse files
authored
Merge branch 'master' into resource-based-throttling
Signed-off-by: Justin Jung <[email protected]>
2 parents fa56e65 + 393a672 commit a2ffcdd

File tree

32 files changed

+262
-287
lines changed

32 files changed

+262
-287
lines changed

Diff for: CHANGELOG.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,9 @@
66
* [FEATURE] Querier/Ruler: Add `query_partial_data` and `rules_partial_data` limits to allow queries/rules to be evaluated with data from a single zone, if other zones are not available. #6526
77
* [FEATURE] Update prometheus alertmanager version to v0.28.0 and add new integration msteamsv2, jira, and rocketchat. #6590
88
* [FEATURE] Ingester: Add a `-ingester.enable-ooo-native-histograms` flag to enable out-of-order native histogram ingestion per tenant. It only takes effect when `-blocks-storage.tsdb.enable-native-histograms=true` and `-ingester.out-of-order-time-window` > 0. It is applied after the restart if it is changed at runtime through the runtime config. #6626
9-
* [FEATURE] Ingester/StoreGateway: Add `resource-thresholds` in ingesters and store gateways to throttle query requests when the pods are under resource pressure. #6674
9+
* [FEATURE] Ingester/StoreGateway: Add `resource-thresholds` in ingesters and store gateways to throttle query requests when the pods are under resource pressure. #6674
10+
* [FEATURE] Ingester: Support out-of-order native histogram ingestion. It automatically enabled when `-ingester.out-of-order-time-window > 0` and `-blocks-storage.tsdb.enable-native-histograms=true`. #6626 #6663
11+
* [ENHANCEMENT] Alertmanager: Add nflog and silences maintenance metrics. #6659
1012
* [ENHANCEMENT] Querier: limit label APIs to query only ingesters if `start` param is not been specified. #6618
1113
* [ENHANCEMENT] Alertmanager: Add new limits `-alertmanager.max-silences-count` and `-alertmanager.max-silences-size-bytes` for limiting silences per tenant. #6605
1214
* [ENHANCEMENT] Update prometheus version to v3.1.0. #6583

Diff for: docs/configuration/config-file-reference.md

-7
Original file line numberDiff line numberDiff line change
@@ -3535,13 +3535,6 @@ The `limits_config` configures default and per-tenant limits imposed by Cortex s
35353535
# CLI flag: -ingester.max-exemplars
35363536
[max_exemplars: <int> | default = 0]
35373537
3538-
# [Experimental] Enable out-of-order native histogram ingestion, it only takes
3539-
# effect when -blocks-storage.tsdb.enable-native-histograms=true and
3540-
# -ingester.out-of-order-time-window > 0. It is applied after the restart if it
3541-
# is changed at runtime through the runtime config.
3542-
# CLI flag: -ingester.enable-ooo-native-histograms
3543-
[enable_ooo_native_histograms: <boolean> | default = false]
3544-
35453538
# Maximum number of chunks that can be fetched in a single query from ingesters
35463539
# and long-term storage. This limit is enforced in the querier, ruler and
35473540
# store-gateway. 0 to disable.

Diff for: docs/configuration/v1-guarantees.md

-1
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,6 @@ Currently experimental features are:
104104
- `-blocks-storage.tsdb.out-of-order-cap-max` (int) CLI flag
105105
- `-ingester.out-of-order-time-window` (duration) CLI flag
106106
- `out_of_order_time_window` (duration) field in runtime config file
107-
- `enable_ooo_native_histograms` (bool) field in runtime config file
108107
- Store Gateway Zone Stable Shuffle Sharding
109108
- `-store-gateway.sharding-ring.zone-stable-shuffle-sharding` CLI flag
110109
- `zone_stable_shuffle_sharding` (boolean) field in config file

Diff for: docs/guides/native-histograms.md

+6-27
Original file line numberDiff line numberDiff line change
@@ -55,35 +55,14 @@ overrides:
5555
5656
## How to enable out-of-order native histograms ingestion
5757
Like samples out-of-order ingestion, the Cortex allows out-of-order ingestion for the native histogram.
58-
To enable it, set the flag `-ingester.enable-ooo-native-histograms`.
58+
It is automatically enabled when `-blocks-storage.tsdb.enable-native-histograms=true` and `-ingester.out-of-order-time-window > 0`.
5959
6060
And via yaml:
61+
6162
```yaml
63+
blocks_storage:
64+
tsdb:
65+
enable_native_histograms: true
6266
limits:
63-
enable_ooo_native_histograms: <bool>
64-
```
65-
66-
Is it only works if when `-blocks-storage.tsdb.enable-native-histograms=true` and `-ingester.out-of-order-time-window > 0`.
67-
68-
To enable it per tenant, you can utilize a [runtime config](../configuration/arguments.md#runtime-configuration-file).
69-
70-
For example, the following yaml file specifies enabling out-of-order native histogram ingestion for `user-1`, but not for `user-2`.
71-
72-
```
73-
overrides:
74-
user-1:
75-
enable_ooo_native_histograms: true
76-
user-2:
77-
enable_ooo_native_histograms: false
78-
```
79-
80-
**Caution**: It is applied after the Ingester restart if it is changed at runtime through the runtime config.
81-
For example, if you have changed the `enable_ooo_native_histograms` value to `false` of the `user-1` via the below yaml file, then the Ingester stops the out-of-order ingestion not until the Ingester restarts.
82-
83-
```
84-
overrides:
85-
user-1:
86-
enable_ooo_native_histograms: false
87-
user-2:
88-
enable_ooo_native_histograms: false
67+
out_of_order_time_window: 5m
8968
```

Diff for: go.mod

+3-3
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ require (
5151
github.com/spf13/afero v1.11.0
5252
github.com/stretchr/testify v1.10.0
5353
github.com/thanos-io/objstore v0.0.0-20241111205755-d1dd89d41f97
54-
github.com/thanos-io/promql-engine v0.0.0-20250302135832-accbf0891a16
54+
github.com/thanos-io/promql-engine v0.0.0-20250324222505-d17b9bdecd14
5555
github.com/thanos-io/thanos v0.37.3-0.20250212101700-346d18bb0f80
5656
github.com/uber/jaeger-client-go v2.30.0+incompatible
5757
github.com/weaveworks/common v0.0.0-20230728070032-dd9e68f319d5
@@ -78,7 +78,7 @@ require (
7878
github.com/VictoriaMetrics/fastcache v1.12.2
7979
github.com/bboreham/go-loser v0.0.0-20230920113527-fcc2c21820a3
8080
github.com/cespare/xxhash/v2 v2.3.0
81-
github.com/google/go-cmp v0.6.0
81+
github.com/google/go-cmp v0.7.0
8282
github.com/hashicorp/golang-lru/v2 v2.0.7
8383
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822
8484
github.com/prometheus/procfs v0.15.1
@@ -248,7 +248,7 @@ require (
248248
golang.org/x/sys v0.30.0 // indirect
249249
golang.org/x/text v0.22.0 // indirect
250250
golang.org/x/tools v0.29.0 // indirect
251-
gonum.org/v1/gonum v0.15.0 // indirect
251+
gonum.org/v1/gonum v0.15.1 // indirect
252252
google.golang.org/api v0.218.0 // indirect
253253
google.golang.org/genproto v0.0.0-20240823204242-4ba0660f739c // indirect
254254
google.golang.org/genproto/googleapis/api v0.0.0-20250115164207-1a7da9e5054f // indirect

Diff for: go.sum

+6-5
Original file line numberDiff line numberDiff line change
@@ -1176,8 +1176,9 @@ github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/
11761176
github.com/google/go-cmp v0.5.7/go.mod h1:n+brtR0CgQNWTVd5ZUFpTBC8YFBDLK/h/bpaJ8/DtOE=
11771177
github.com/google/go-cmp v0.5.8/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
11781178
github.com/google/go-cmp v0.5.9/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
1179-
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
11801179
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
1180+
github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
1181+
github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU=
11811182
github.com/google/go-querystring v1.1.0 h1:AnCroh3fv4ZBgVIf1Iwtovgjaw/GiKJo8M8yD/fhyJ8=
11821183
github.com/google/go-querystring v1.1.0/go.mod h1:Kcdr2DB4koayq7X8pmAG4sNG59So17icRSOU623lUBU=
11831184
github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
@@ -1684,8 +1685,8 @@ github.com/thanos-community/galaxycache v0.0.0-20211122094458-3a32041a1f1e h1:f1
16841685
github.com/thanos-community/galaxycache v0.0.0-20211122094458-3a32041a1f1e/go.mod h1:jXcofnrSln/cLI6/dhlBxPQZEEQHVPCcFaH75M+nSzM=
16851686
github.com/thanos-io/objstore v0.0.0-20241111205755-d1dd89d41f97 h1:VjG0mwhN1DkncwDHFvrpd12/2TLfgYNRmEQA48ikp+0=
16861687
github.com/thanos-io/objstore v0.0.0-20241111205755-d1dd89d41f97/go.mod h1:vyzFrBXgP+fGNG2FopEGWOO/zrIuoy7zt3LpLeezRsw=
1687-
github.com/thanos-io/promql-engine v0.0.0-20250302135832-accbf0891a16 h1:ezd8hNCWiGQr4kdfCHFa0VCSi+LAO/28Mna264nDs2c=
1688-
github.com/thanos-io/promql-engine v0.0.0-20250302135832-accbf0891a16/go.mod h1:aHSV5hL94fNb7PklN9L0V10j+/RGIlzqbw7OLdNgZFs=
1688+
github.com/thanos-io/promql-engine v0.0.0-20250324222505-d17b9bdecd14 h1:PJaKaeC6t2LvJxjiQwptj11bB6+ah9wRfZG75Ib9lo8=
1689+
github.com/thanos-io/promql-engine v0.0.0-20250324222505-d17b9bdecd14/go.mod h1:lZZTb17gUxAWtFAeZzLn2p2hoEPvXGSj0rc3XqsLsig=
16891690
github.com/thanos-io/thanos v0.37.3-0.20250212101700-346d18bb0f80 h1:mOCRYn9SLBWJCXAdP+qDfgZDc0eqDxDc2HZGKTZ5vzk=
16901691
github.com/thanos-io/thanos v0.37.3-0.20250212101700-346d18bb0f80/go.mod h1:Y7D8la8B5rpzRVKq2HCR4hbYZ4LGroSPqIJjtizgQg8=
16911692
github.com/tjhop/slog-gokit v0.1.2 h1:pmQI4SvU9h4gA0vIQsdhJQSqQg4mOmsPykG2/PM3j1I=
@@ -2281,8 +2282,8 @@ gonum.org/v1/gonum v0.0.0-20180816165407-929014505bf4/go.mod h1:Y+Yx5eoAFn32cQvJ
22812282
gonum.org/v1/gonum v0.8.2/go.mod h1:oe/vMfY3deqTw+1EZJhuvEW2iwGF1bW9wwu7XCu0+v0=
22822283
gonum.org/v1/gonum v0.9.3/go.mod h1:TZumC3NeyVQskjXqmyWt4S3bINhy7B4eYwW69EbyX+0=
22832284
gonum.org/v1/gonum v0.11.0/go.mod h1:fSG4YDCxxUZQJ7rKsQrj0gMOg00Il0Z96/qMA4bVQhA=
2284-
gonum.org/v1/gonum v0.15.0 h1:2lYxjRbTYyxkJxlhC+LvJIx3SsANPdRybu1tGj9/OrQ=
2285-
gonum.org/v1/gonum v0.15.0/go.mod h1:xzZVBJBtS+Mz4q0Yl2LJTk+OxOg4jiXZ7qBoM0uISGo=
2285+
gonum.org/v1/gonum v0.15.1 h1:FNy7N6OUZVUaWG9pTiD+jlhdQ3lMP+/LcTpJ6+a8sQ0=
2286+
gonum.org/v1/gonum v0.15.1/go.mod h1:eZTZuRFrzu5pcyjN5wJhcIhnUdNijYxX1T2IcrOGY0o=
22862287
gonum.org/v1/netlib v0.0.0-20190313105609-8cb42192e0e0/go.mod h1:wa6Ws7BG/ESfp6dHfk7C6KdzKA7wR7u/rKwOGE66zvw=
22872288
gonum.org/v1/plot v0.0.0-20190515093506-e2840ee46a6b/go.mod h1:Wt8AAjI+ypCyYX3nZBvf6cAIx93T+c/OS2HFAYskSZc=
22882289
gonum.org/v1/plot v0.9.0/go.mod h1:3Pcqqmp6RHvJI72kgb8fThyUnav364FOsdDo2aGW5lY=

Diff for: integration/native_histogram_test.go

-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,6 @@ func TestOOONativeHistogramIngestion(t *testing.T) {
3333

3434
flags := mergeFlags(baseFlags, map[string]string{
3535
// ooo setting
36-
"-ingester.enable-ooo-native-histograms": "true",
3736
"-blocks-storage.tsdb.enable-native-histograms": "true",
3837
"-ingester.out-of-order-time-window": "5m",
3938
// alert manager

Diff for: pkg/alertmanager/alertmanager_metrics.go

+28
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ type alertmanagerMetrics struct {
3030
nflogQueryErrorsTotal *prometheus.Desc
3131
nflogQueryDuration *prometheus.Desc
3232
nflogPropagatedMessagesTotal *prometheus.Desc
33+
nflogMaintenanceTotal *prometheus.Desc
34+
nflogMaintenanceErrorsTotal *prometheus.Desc
3335

3436
// exported metrics, gathered from Alertmanager Marker
3537
markerAlerts *prometheus.Desc
@@ -43,6 +45,8 @@ type alertmanagerMetrics struct {
4345
silencesQueryDuration *prometheus.Desc
4446
silences *prometheus.Desc
4547
silencesPropagatedMessagesTotal *prometheus.Desc
48+
silencesMaintenanceTotal *prometheus.Desc
49+
silencesMaintenanceErrorsTotal *prometheus.Desc
4650

4751
// The alertmanager config hash.
4852
configHashValue *prometheus.Desc
@@ -127,6 +131,14 @@ func newAlertmanagerMetrics() *alertmanagerMetrics {
127131
"cortex_alertmanager_nflog_gossip_messages_propagated_total",
128132
"Number of received gossip messages that have been further gossiped.",
129133
nil, nil),
134+
nflogMaintenanceTotal: prometheus.NewDesc(
135+
"cortex_alertmanager_nflog_maintenance_total",
136+
"How many maintenances were executed for the notification log.",
137+
nil, nil),
138+
nflogMaintenanceErrorsTotal: prometheus.NewDesc(
139+
"cortex_alertmanager_nflog_maintenance_errors_total",
140+
"How many maintenances were executed for the notification log that failed.",
141+
nil, nil),
130142
markerAlerts: prometheus.NewDesc(
131143
"cortex_alertmanager_alerts",
132144
"How many alerts by state.",
@@ -163,6 +175,14 @@ func newAlertmanagerMetrics() *alertmanagerMetrics {
163175
"cortex_alertmanager_silences",
164176
"How many silences by state.",
165177
[]string{"user", "state"}, nil),
178+
silencesMaintenanceTotal: prometheus.NewDesc(
179+
"cortex_alertmanager_silences_maintenance_total",
180+
"How many maintenances were executed for silences.",
181+
nil, nil),
182+
silencesMaintenanceErrorsTotal: prometheus.NewDesc(
183+
"cortex_alertmanager_silences_maintenance_errors_total",
184+
"How many maintenances were executed for silences that failed.",
185+
nil, nil),
166186
configHashValue: prometheus.NewDesc(
167187
"cortex_alertmanager_config_hash",
168188
"Hash of the currently loaded alertmanager configuration.",
@@ -268,6 +288,8 @@ func (m *alertmanagerMetrics) Describe(out chan<- *prometheus.Desc) {
268288
out <- m.nflogQueryErrorsTotal
269289
out <- m.nflogQueryDuration
270290
out <- m.nflogPropagatedMessagesTotal
291+
out <- m.nflogMaintenanceTotal
292+
out <- m.nflogMaintenanceErrorsTotal
271293
out <- m.silencesGCDuration
272294
out <- m.silencesSnapshotDuration
273295
out <- m.silencesSnapshotSize
@@ -276,6 +298,8 @@ func (m *alertmanagerMetrics) Describe(out chan<- *prometheus.Desc) {
276298
out <- m.silencesQueryDuration
277299
out <- m.silencesPropagatedMessagesTotal
278300
out <- m.silences
301+
out <- m.silencesMaintenanceTotal
302+
out <- m.silencesMaintenanceErrorsTotal
279303
out <- m.configHashValue
280304
out <- m.partialMerges
281305
out <- m.partialMergesFailed
@@ -317,6 +341,8 @@ func (m *alertmanagerMetrics) Collect(out chan<- prometheus.Metric) {
317341
data.SendSumOfCounters(out, m.nflogQueryErrorsTotal, "alertmanager_nflog_query_errors_total")
318342
data.SendSumOfHistograms(out, m.nflogQueryDuration, "alertmanager_nflog_query_duration_seconds")
319343
data.SendSumOfCounters(out, m.nflogPropagatedMessagesTotal, "alertmanager_nflog_gossip_messages_propagated_total")
344+
data.SendSumOfCounters(out, m.nflogMaintenanceTotal, "alertmanager_nflog_maintenance_total")
345+
data.SendSumOfCounters(out, m.nflogMaintenanceErrorsTotal, "alertmanager_nflog_maintenance_errors_total")
320346

321347
data.SendSumOfSummaries(out, m.silencesGCDuration, "alertmanager_silences_gc_duration_seconds")
322348
data.SendSumOfSummaries(out, m.silencesSnapshotDuration, "alertmanager_silences_snapshot_duration_seconds")
@@ -326,6 +352,8 @@ func (m *alertmanagerMetrics) Collect(out chan<- prometheus.Metric) {
326352
data.SendSumOfHistograms(out, m.silencesQueryDuration, "alertmanager_silences_query_duration_seconds")
327353
data.SendSumOfCounters(out, m.silencesPropagatedMessagesTotal, "alertmanager_silences_gossip_messages_propagated_total")
328354
data.SendSumOfGaugesPerUserWithLabels(out, m.silences, "alertmanager_silences", "state")
355+
data.SendSumOfCounters(out, m.silencesMaintenanceTotal, "alertmanager_silences_maintenance_total")
356+
data.SendSumOfCounters(out, m.silencesMaintenanceErrorsTotal, "alertmanager_silences_maintenance_errors_total")
329357

330358
data.SendMaxOfGaugesPerUser(out, m.configHashValue, "alertmanager_config_hash")
331359

0 commit comments

Comments
 (0)