Problem Statement
Retry behavior for model calls is currently inconsistent across OpenViking.
- Some VLM paths go through
VLMConfig and inherit vlm.max_retries, while others call backend instances directly and bypass that default.
- Embedding providers do not share a single retry contract today: some have custom retry logic, some have fixed SDK-level retries, and some have no config-driven retry behavior.
- This makes rate limiting and other transient failures unevenly handled across semantic processing, parsing, and embedding flows.
Proposed Solution
Unify retry at the model-call layer for both VLM and embedding.
Proposed shape:
- Keep the config surface minimal.
- Use
vlm.max_retries and add embedding.max_retries.
- Default both to
3.
- Treat
max_retries = 0 as retry disabled.
- Remove function-level
max_retries parameters from VLM interfaces and make retry fully config-driven.
- Apply a shared transient retry policy at the backend/provider layer for:
- VLM text + vision, sync + async
- embedding
embed() + embed_batch() across providers
- Retry only known transient errors such as
429, 5xx, TooManyRequests, RateLimit, RequestBurstTooFast, timeout, and connection-reset/refused scenarios.
- Do not retry permanent errors such as
400, 401, 403, or account/billing failures.
Alternatives Considered
- Add business-layer wrappers such as
_llm_with_retry() in each call site.
This is useful as a local patch, but it does not scale and will keep producing inconsistent behavior across modules.
- Keep retry configurable per-call via function parameters.
We decided against this because these are internal call paths and a config-driven bottom-layer policy is simpler and more consistent.
- Add a larger retry policy abstraction with many config knobs.
Rejected for now in favor of a minimal max_retries-only design.
Feature Area
Model Integration
Use Case
OpenViking should handle transient model failures consistently regardless of whether a request originates from semantic indexing, structured VLM parsing, or embedding generation. Users should be able to control retry behavior from config without having to rely on module-specific wrappers or implementation details.
Example API (Optional)
# config-driven only
vlm:
max_retries: 3
embedding:
max_retries: 3
Additional Context
This came up while reviewing PR #889. The local fix in that PR is useful, but we want to address the root cause by unifying retry semantics at the model backend layer instead of adding more module-specific wrappers.
Problem Statement
Retry behavior for model calls is currently inconsistent across OpenViking.
VLMConfigand inheritvlm.max_retries, while others call backend instances directly and bypass that default.Proposed Solution
Unify retry at the model-call layer for both VLM and embedding.
Proposed shape:
vlm.max_retriesand addembedding.max_retries.3.max_retries = 0as retry disabled.max_retriesparameters from VLM interfaces and make retry fully config-driven.embed()+embed_batch()across providers429,5xx,TooManyRequests,RateLimit,RequestBurstTooFast, timeout, and connection-reset/refused scenarios.400,401,403, or account/billing failures.Alternatives Considered
_llm_with_retry()in each call site.This is useful as a local patch, but it does not scale and will keep producing inconsistent behavior across modules.
We decided against this because these are internal call paths and a config-driven bottom-layer policy is simpler and more consistent.
Rejected for now in favor of a minimal
max_retries-only design.Feature Area
Model Integration
Use Case
OpenViking should handle transient model failures consistently regardless of whether a request originates from semantic indexing, structured VLM parsing, or embedding generation. Users should be able to control retry behavior from config without having to rely on module-specific wrappers or implementation details.
Example API (Optional)
Additional Context
This came up while reviewing PR #889. The local fix in that PR is useful, but we want to address the root cause by unifying retry semantics at the model backend layer instead of adding more module-specific wrappers.