What is the problem your feature solves, or the need it fulfills?
Pingora retries upstream calls only on connect-time errors (fail_to_connect returning an error with e.set_retry(true)). There is no way to drive the same retry loop from a response-side signal, so status-code-driven retries — the common 502/503/504 case during rolling restarts — cannot be expressed cleanly.
This blocks API-gateway and reverse-proxy use cases where transient 5xx responses during deployments should land on a different replica rather than being surfaced to the client. nginx (proxy_next_upstream), Envoy (retry_policy.retry_on), and HAProxy (retry-on) all support this; Pingora is the outlier.
Describe the solution you'd like
Add one minimal ProxyHttp trait method that fires after upstream_response_filter and lets user code abort the response with a retryable error. The existing retry path then handles re-running upstream_peer(), with the request-body retry-buffer and error_while_proxy machinery deciding whether the request can be replayed.
async fn upstream_response_decision(
&self,
_session: &mut Session,
_upstream_response: &ResponseHeader,
_ctx: &mut Self::CTX,
) -> Option<Box<Error>> { None }
- Default returns
None — no behaviour change for existing callers.
- Returning
Some(err) aborts the response before any bytes flow downstream.
- If
err.set_retry(true), Pingora re-runs upstream_peer() exactly like a connect-time retry.
The hook fires at header-arrival time so aborting is safe; once response bytes are flowing to the client, no proxy can retry safely (same restriction as nginx's proxy_next_upstream).
User code example:
async fn upstream_response_decision(
&self,
_session: &mut Session,
upstream_response: &ResponseHeader,
ctx: &mut Self::CTX,
) -> Option<Box<Error>> {
let status = upstream_response.status.as_u16();
if matches!(status, 502 | 503 | 504) && ctx.attempts < 3 {
ctx.attempts += 1;
let mut err = Error::new(ErrorType::HTTPStatus(status));
err.set_retry(true);
return Some(err);
}
None
}
I have a draft PR ready: #872. Happy to revise based on design feedback.
Describe alternatives you've considered
-
Extending error_while_proxy to also fire on response-header arrival. One hook reused for two phases instead of two hooks. Smaller diff but less ergonomic — the same method would need both response-header and error inputs. The dedicated hook is clearer at the API surface.
-
User-side workaround buffering responses outside Pingora. Doable in user code: buffer small upstream responses, inspect status, and re-issue by recursion. Costs ~1 day per consumer, only works for responses under the buffer cap, and is obsolete once Pingora ships a first-class hook. We've prototyped this and the workaround code is the kind of thing we'd rather not ship.
-
Status-aware fail_to_connect. Doesn't fit — fail_to_connect is conceptually about connection failures, and broadening its semantics would muddy the trait contract.
Additional context
Prior art:
The implementation is a 33-line trait method addition + 15 lines wiring it into upstream_filter. No new state, no API changes elsewhere. Default behaviour is identical to today. Workspace builds and pingora-proxy lib tests pass unchanged. Diff is in the linked PR.
What is the problem your feature solves, or the need it fulfills?
Pingora retries upstream calls only on connect-time errors (
fail_to_connectreturning an error withe.set_retry(true)). There is no way to drive the same retry loop from a response-side signal, so status-code-driven retries — the common 502/503/504 case during rolling restarts — cannot be expressed cleanly.This blocks API-gateway and reverse-proxy use cases where transient 5xx responses during deployments should land on a different replica rather than being surfaced to the client. nginx (
proxy_next_upstream), Envoy (retry_policy.retry_on), and HAProxy (retry-on) all support this; Pingora is the outlier.Describe the solution you'd like
Add one minimal
ProxyHttptrait method that fires afterupstream_response_filterand lets user code abort the response with a retryable error. The existing retry path then handles re-runningupstream_peer(), with the request-body retry-buffer anderror_while_proxymachinery deciding whether the request can be replayed.None— no behaviour change for existing callers.Some(err)aborts the response before any bytes flow downstream.err.set_retry(true), Pingora re-runsupstream_peer()exactly like a connect-time retry.The hook fires at header-arrival time so aborting is safe; once response bytes are flowing to the client, no proxy can retry safely (same restriction as nginx's
proxy_next_upstream).User code example:
I have a draft PR ready: #872. Happy to revise based on design feedback.
Describe alternatives you've considered
Extending
error_while_proxyto also fire on response-header arrival. One hook reused for two phases instead of two hooks. Smaller diff but less ergonomic — the same method would need both response-header and error inputs. The dedicated hook is clearer at the API surface.User-side workaround buffering responses outside Pingora. Doable in user code: buffer small upstream responses, inspect status, and re-issue by recursion. Costs ~1 day per consumer, only works for responses under the buffer cap, and is obsolete once Pingora ships a first-class hook. We've prototyped this and the workaround code is the kind of thing we'd rather not ship.
Status-aware
fail_to_connect. Doesn't fit —fail_to_connectis conceptually about connection failures, and broadening its semantics would muddy the trait contract.Additional context
Prior art:
proxy_next_upstreamretry_policyretry-onThe implementation is a 33-line trait method addition + 15 lines wiring it into
upstream_filter. No new state, no API changes elsewhere. Default behaviour is identical to today. Workspace builds andpingora-proxylib tests pass unchanged. Diff is in the linked PR.