Skip to content

fix: add exponential retry logic to stackit embedder #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: deps-main
Choose a base branch
from

Conversation

a-klos
Copy link
Member

@a-klos a-klos commented Aug 5, 2025

This pull request introduces robust retry logic for embedding requests in the StackitEmbedder and refactors configuration management for external services in the RAG infrastructure. The main changes are the addition of exponential backoff for embedding failures, new retry-related settings, and a more organized approach to service configuration in values.yaml.

Embedding robustness and configuration enhancements:

Retry logic improvements:

  • Added exponential backoff retry logic to the embed_documents method in StackitEmbedder, allowing up to a configurable number of retries with delays on transient failures. This improves reliability when the embedding API is temporarily unavailable.
  • Introduced logging for failed embedding attempts and retries in stackit_embedder.py for better observability.

Embedder settings:

  • Added new configuration fields to StackitEmbedderSettings for max_retries, retry_base_delay, and retry_max_delay, allowing fine-grained control of retry behavior. [1] [2]

Infrastructure/service configuration refactor:

  • Refactored infrastructure/rag/values.yaml to move service deployment and connection details (PostgreSQL, Redis, ClickHouse, S3/MinIO) under dedicated configuration sections, clarifying which services are external and improving maintainability. [1] [2]
  • Enhanced the langfuse configuration block with explicit image, web, worker, authentication, and environment variable settings for more granular control over deployment.

Embedder deployment configuration:

  • Added the new retry parameters (STACKIT_EMBEDDER_MAX_RETRIES, STACKIT_EMBEDDER_RETRY_BASE_DELAY, STACKIT_EMBEDDER_RETRY_MAX_DELAY) to the stackitEmbedder section in values.yaml to enable configuration via environment variables.

@huhn511 huhn511 requested a review from Copilot August 13, 2025 07:13
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request enhances the StackitEmbedder with robust retry logic using exponential backoff to handle transient embedding API failures. The changes improve reliability and observability while making retry behavior configurable.

Key changes:

  • Added exponential backoff retry logic to the embed_documents method with configurable parameters
  • Introduced new settings for retry behavior: max retries, base delay, and maximum delay
  • Enhanced error handling and logging for better observability of embedding failures

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
stackit_embedder_settings.py Added three new configuration fields for retry behavior with defaults
stackit_embedder.py Implemented exponential backoff retry logic with comprehensive error handling and logging
values.yaml Added environment variable configuration for the new retry parameters

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

)
return [data.embedding for data in responses.data]

except Exception as e:
Copy link
Preview

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching all exceptions with a bare Exception is too broad. Consider catching specific exceptions like openai.APIError, openai.RateLimitError, or requests.RequestException to avoid retrying on non-recoverable errors like authentication failures.

Suggested change
except Exception as e:
except (APIError, RateLimitError, Timeout, APIConnectionError) as e:

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

@manu-hoffmann manu-hoffmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with one suggestion on handling rate limited requests.


# Calculate exponential backoff delay
# Calculate exponential backoff delay (cap exponent to prevent overflow)
delay = min(self._settings.retry_base_delay * (2**min(attempt, 10)), self._settings.retry_max_delay)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(rate-limits): you could investigate the response code and use the x-ratelimit-reset-requests and x-ratelimit-reset-tokens header on 429 responses to set the delay. This would reduce client errors and handle rate limits as expected.

To get the headers, you could do something like:

raw_response = openai_client.embeddings.with_raw_response.create(..)
response_headers = raw_response.headers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants