Skip to content

Add querier.ingester-query-max-attempts to retry on partial data. #6714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

justinjung04
Copy link
Contributor

@justinjung04 justinjung04 commented Apr 22, 2025

What this PR does:
In #6526, new configs query_partial_data and rules_partial_data were added which allows tenants to receive 2xx with a warning message when the data accuracy is relatively high in zone-aware setting.

This PR adds retry logic in querier getting data from ingesters, retrying the requests if the response is partial data. The new configuration, querier.ingester-query-max-attempts, allows ingester queries to be retried. Default is set to 1.

Which issue(s) this PR fixes:
n/a

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
@justinjung04 justinjung04 changed the title Make partial data responses to be retryable Add querier.ingester-query-max-attempts to retry on partial data. Apr 22, 2025
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
@justinjung04 justinjung04 marked this pull request as ready for review April 22, 2025 06:03
Copy link
Contributor

@danielblando danielblando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SungJin1212
Copy link
Member

LGTM!

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should document clearly and provide some suggestions on what's the recommended set up for this retry value.

To me this flag is kind of overlapped with partial data and I am not sure how much the retry helps for most of the usecase.
We know it is unlikely to miss series so we return partial data with 4XX. The retry may succeed but given we only wait short period time and Ingester queries are usually within ms, I am unsure if it is worth it to retry more espeically if Ingesters are high load or ongoing deployment

@@ -192,6 +202,33 @@ func (q *distributorQuerier) streamingSelect(ctx context.Context, sortSeries, pa
return seriesSet
}

func (q *distributorQuerier) queryWithRetry(ctx context.Context, queryFunc func() (*client.QueryStreamResponse, error)) (*client.QueryStreamResponse, error) {
if q.ingesterQueryMaxAttempts == 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle case of ingesterQueryMaxAttempts set to 0 as it retries forever IIUC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the validation in config.Validate:

	if cfg.IngesterQueryMaxAttempts < 1 {
		return errInvalidIngesterQueryMaxAttempts
	}

But would you still want the condition to be changed to if q.ingesterQueryMaxAttempts <= 1 {?

"github.com/cortexproject/cortex/pkg/util/chunkcompat"
"github.com/cortexproject/cortex/pkg/util/spanlogger"
)

const retryMinBackoff = time.Second
const retryMaxBackoff = 5 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ingester queries usually finish in ms. I am not sure if it is worth it to wait for 1s ~ 5s backoff retry as it may cause more issues like increased inflight queries on Ingesters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants