Allow status-code-driven upstream retries via a ProxyHttp hook

## What is the problem your feature solves, or the need it fulfills?

Pingora retries upstream calls only on connect-time errors (`fail_to_connect` returning an error with `e.set_retry(true)`). There is no way to drive the same retry loop from a response-side signal, so status-code-driven retries — the common 502/503/504 case during rolling restarts — cannot be expressed cleanly.

This blocks API-gateway and reverse-proxy use cases where transient 5xx responses during deployments should land on a different replica rather than being surfaced to the client. nginx (`proxy_next_upstream`), Envoy (`retry_policy.retry_on`), and HAProxy (`retry-on`) all support this; Pingora is the outlier.

## Describe the solution you'd like

Add one minimal `ProxyHttp` trait method that fires after `upstream_response_filter` and lets user code abort the response with a retryable error. The existing retry path then handles re-running `upstream_peer()`, with the request-body retry-buffer and `error_while_proxy` machinery deciding whether the request can be replayed.

```rust
async fn upstream_response_decision(
    &self,
    _session: &mut Session,
    _upstream_response: &ResponseHeader,
    _ctx: &mut Self::CTX,
) -> Option<Box<Error>> { None }
```

- Default returns `None` — no behaviour change for existing callers.
- Returning `Some(err)` aborts the response before any bytes flow downstream.
- If `err.set_retry(true)`, Pingora re-runs `upstream_peer()` exactly like a connect-time retry.

The hook fires at header-arrival time so aborting is safe; once response bytes are flowing to the client, no proxy can retry safely (same restriction as nginx's `proxy_next_upstream`).

User code example:

```rust
async fn upstream_response_decision(
    &self,
    _session: &mut Session,
    upstream_response: &ResponseHeader,
    ctx: &mut Self::CTX,
) -> Option<Box<Error>> {
    let status = upstream_response.status.as_u16();
    if matches!(status, 502 | 503 | 504) && ctx.attempts < 3 {
        ctx.attempts += 1;
        let mut err = Error::new(ErrorType::HTTPStatus(status));
        err.set_retry(true);
        return Some(err);
    }
    None
}
```

I have a draft PR ready: #872. Happy to revise based on design feedback.

## Describe alternatives you've considered

1. **Extending `error_while_proxy` to also fire on response-header arrival.** One hook reused for two phases instead of two hooks. Smaller diff but less ergonomic — the same method would need both response-header and error inputs. The dedicated hook is clearer at the API surface.

2. **User-side workaround buffering responses outside Pingora.** Doable in user code: buffer small upstream responses, inspect status, and re-issue by recursion. Costs ~1 day per consumer, only works for responses under the buffer cap, and is obsolete once Pingora ships a first-class hook. We've prototyped this and the workaround code is the kind of thing we'd rather not ship.

3. **Status-aware `fail_to_connect`.** Doesn't fit — `fail_to_connect` is conceptually about connection failures, and broadening its semantics would muddy the trait contract.

## Additional context

Prior art:
- nginx [`proxy_next_upstream`](http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_next_upstream)
- Envoy [`retry_policy`](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#config-route-v3-retrypolicy)
- HAProxy [`retry-on`](https://docs.haproxy.org/2.8/configuration.html#4-retry-on)

The implementation is a 33-line trait method addition + 15 lines wiring it into `upstream_filter`. No new state, no API changes elsewhere. Default behaviour is identical to today. Workspace builds and `pingora-proxy` lib tests pass unchanged. Diff is in the linked PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow status-code-driven upstream retries via a ProxyHttp hook #873

What is the problem your feature solves, or the need it fulfills?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Allow status-code-driven upstream retries via a ProxyHttp hook #873

Description

What is the problem your feature solves, or the need it fulfills?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions