Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resource-thresholds in ingesters and store gateways to throttle query requests when the pods are under resource pressure. #6674

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

justinjung04
Copy link
Contributor

@justinjung04 justinjung04 commented Mar 26, 2025

What this PR does:

This PR introduces ability to throttle incoming query requests in ingesters and store gateways when their CPU and heap is under pressure.

Data stores (ingesters and store gateways) currently don't have good ways to limit and control resource allocation per query request. Each query request has huge variance in its resource consumption, so it's hard to define static limits to protect ingesters or store gateways from using more than 100% CPU or being OOMkilled.

I'm introducing new component called resource monitor which can be referenced in ingesters and store gateways to block incoming query requests when the utilization is above the defined threshold.

Here is a test where high TPS of queries exhausting ingester CPU was throttled by the new feature, stabalizing the ingester CPU at around configured threshold of 40%.

Screenshot 2025-03-26 at 9 36 30 AM

I'm applying this to ingesters and store gateways for now just to keep the PR size small, but later this can be easily applied to query frontend and queriers as well.

Which issue(s) this PR fixes:
n/a

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
@justinjung04 justinjung04 force-pushed the resource-based-throttling branch from 30d1cba to 9efbbd9 Compare March 26, 2025 16:27
Signed-off-by: Justin Jung <[email protected]>
@justinjung04 justinjung04 changed the title Resource based throttling Add resource-thresholds to throttle query requests when the pods are under resource pressure. Mar 26, 2025
@justinjung04 justinjung04 changed the title Add resource-thresholds to throttle query requests when the pods are under resource pressure. Add resource-thresholds in ingesters and store gateways to throttle query requests when the pods are under resource pressure. Mar 26, 2025
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
@justinjung04 justinjung04 force-pushed the resource-based-throttling branch from 841d578 to 5cccd60 Compare March 26, 2025 21:55
Signed-off-by: Justin Jung <[email protected]>
@justinjung04
Copy link
Contributor Author

When choosing how to retrieve correct CPU and heap data, I basically tested different metrics from https://pkg.go.dev/runtime/metrics and https://github.com/prometheus/procfs, compared with kubernetes metrics to find closest metrics. I thought it's unnecessary to comment about different metrics that I tried, but let me know if you believe I should mention about it somewhere.

@@ -5,6 +5,8 @@
* [FEATURE] Query Frontend: Add dynamic interval size for query splitting. This is enabled by configuring experimental flags `querier.max-shards-per-query` and/or `querier.max-fetched-data-duration-per-query`. The split interval size is dynamically increased to maintain a number of shards and total duration fetched below the configured values. #6458
* [FEATURE] Querier/Ruler: Add `query_partial_data` and `rules_partial_data` limits to allow queries/rules to be evaluated with data from a single zone, if other zones are not available. #6526
* [FEATURE] Update prometheus alertmanager version to v0.28.0 and add new integration msteamsv2, jira, and rocketchat. #6590
* [FEATURE] Ingester: Add a `-ingester.enable-ooo-native-histograms` flag to enable out-of-order native histogram ingestion per tenant. It only takes effect when `-blocks-storage.tsdb.enable-native-histograms=true` and `-ingester.out-of-order-time-window` > 0. It is applied after the restart if it is changed at runtime through the runtime config. #6626
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an unrelated change?

}

func NewScanner() (*Scanner, error) {
proc, err := procfs.Self()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if in Cortex docs somewhere we say that something else other than Linux is supported but maybe it makes sense to error out if the OS is not Linux? Or return a scanner that does nothing.

@@ -271,6 +271,15 @@ query_scheduler:
# CLI flag: -query-scheduler.grpc-client-config.connect-timeout
[connect_timeout: <duration> | default = 5s]

resource_thresholds:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, it's worth mentioning what metrics are being used to estimate resource usage, and it's probably worth mentioning that they might not exactly translate to memory usage, for example.

@@ -5,6 +5,8 @@
* [FEATURE] Query Frontend: Add dynamic interval size for query splitting. This is enabled by configuring experimental flags `querier.max-shards-per-query` and/or `querier.max-fetched-data-duration-per-query`. The split interval size is dynamically increased to maintain a number of shards and total duration fetched below the configured values. #6458
* [FEATURE] Querier/Ruler: Add `query_partial_data` and `rules_partial_data` limits to allow queries/rules to be evaluated with data from a single zone, if other zones are not available. #6526
* [FEATURE] Update prometheus alertmanager version to v0.28.0 and add new integration msteamsv2, jira, and rocketchat. #6590
* [FEATURE] Ingester: Add a `-ingester.enable-ooo-native-histograms` flag to enable out-of-order native histogram ingestion per tenant. It only takes effect when `-blocks-storage.tsdb.enable-native-histograms=true` and `-ingester.out-of-order-time-window` > 0. It is applied after the restart if it is changed at runtime through the runtime config. #6626
* [FEATURE] Ingester/StoreGateway: Add `resource-thresholds` in ingesters and store gateways to throttle query requests when the pods are under resource pressure. #6674
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be interesting to know/document the use-case exactly? Is it so that the user would get a nice error message instead of waiting for the timeout to hit when there's a resource pressure?


// Variables to calculate average CPU utilization
index int
cpuRates []float64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to use an array instead of a slice here? dataPointsToAvg is a constant either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants