Skip to content

[iris/finelog] Federated global-finelog log relay + authenticated ingress#6871

Merged
rjpower merged 6 commits into
mainfrom
weaver/iris-federation-pr4b-finelog-rel
Jul 2, 2026
Merged

[iris/finelog] Federated global-finelog log relay + authenticated ingress#6871
rjpower merged 6 commits into
mainfrom
weaver/iris-federation-pr4b-finelog-rel

Conversation

@rjpower

@rjpower rjpower commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Adds the write side of the federation global-log plane (design §6): each cluster's controller durably forwards its local finelog to one shared global finelog, and finelog gains a mandatory authenticated ingress front.

Log forwarding

A new finelog.forwarder.LogForwarder (in finelog, generic) tails a source finelog and re-appends each new batch to a target under the same keys. Durability leans on the source finelog already being durable (parquet + retention): the forwarder persists only a forward watermark in a small JSON state file — not a second copy of the data — and advances it only after an ack'd PushLogs (which returns once the global store has durably persisted the batch). A crash or transient egress failure re-forwards the in-flight batch (at-least-once; duplicate lines are bounded to the failure boundary — federation degrades observability, never job correctness). First run seeds the watermark at the source's current cursor, so enabling it ships new logs without backfilling the whole retention window; a busy cluster's backlog drains within a tick rather than one batch per poll interval.

The iris controller wires this up thinly (controller/finelog_relay.py): it supplies its local client, a per-cluster delegation credential, and a state path on its persistent dir; the forwarding/watermark logic lives entirely in finelog. A cluster's finelog is one flat config block — finelog.config names the local store; finelog.relay_address (optional) points at the shared global one, with delegation_key/static_token for the credential — and setting relay_address is what turns the forwarder on. Inert until then: no forwarder thread, no egress, byte-identical single-cluster behavior.

Cross-cluster log namespacing is deferred. Bare per-cluster keys (/user/<job>/...) collide when many clusters land in one store; rather than have the relay rewrite keys mid-stream, namespacing will be done on the write side (tasks/controller write the namespaced key directly) or via a finelog cluster column, in a follow-up. This PR ships the durable relay + auth mechanism; a single cluster relaying to a global store works today.

Authenticated ingress (new security surface)

finelog's server (lib/finelog/rust/src/server/auth.rs) now gates every RPC with an ordered stack of typed auth layerscidr (admit by transport-peer network) and jwt (HS256 bearer verified against per-cluster delegation keys) — that is default-deny and always installed. An unconfigured server falls back to allow-localhost (loopback only), never open. HS256 verification uses vetted RustCrypto primitives (constant-time HMAC, no ring), rejects alg:none/algorithm-confusion, and tolerates a small exp clock-skew leeway. The /debug/* admin routes are gated by the same policy. It is configured by a single FINELOG_AUTH_POLICY JSON layer list, which the finelog deploy config, GCP bootstrap script, and k8s manifest all plumb (k8s refuses to inline jwt secrets — those must come through a Secret).

Each relaying controller authenticates its pushes with a short-lived HS256 delegation JWT, minted via the same JwtTokenManager the control plane uses but signed with a dedicated per-cluster key the global store verifies — so the shared store can verify a cluster's tokens without gaining the power to mint control-plane tokens.

Because auth is always installed, the three committed local finelog configs (marin, marin-dev, cw-us-east-02a) gain an explicit cidr layer (RFC1918 + loopback) so intra-cluster/pod ingest keeps working — reachable from the private network, never the public internet.

Cost

Relaying every cluster's logs to one region is deliberate cross-region egress — the trade for a uniform, always-available, cross-cluster-queryable log surface. finelog already batches/compresses and tiers cold segments to object storage, which bounds steady-state cost.

The read side (surfacing a federated job's logs through the parent: job_id→peer-namespace remap + read-through auth) is deferred to a follow-up (#6862).

Part of #6718 — PR4b of the federation rollout (global finelog + relay + authed ingress).

…ress

Adds the write side of the federation global-log plane (design §6): each cluster's controller durably forwards its local finelog to one shared **global finelog** under cluster-namespaced keys, and finelog gains a **mandatory authenticated ingress** front.

## Log forwarding
A new `finelog.forwarder.LogForwarder` (in finelog, generic) tails a source finelog and re-appends each batch to a target under a caller-supplied key prefix (`/c/<cluster_id>`). Durability leans on the source finelog already being durable (parquet + retention): the forwarder persists only a forward **watermark** in a small JSON state file — not a second copy of the data — and advances it only after an ack'd `PushLogs` (which returns once the global store has durably persisted the batch). A crash or transient egress failure re-forwards the in-flight batch (at-least-once; duplicate lines are bounded to the failure boundary — federation degrades observability, never job correctness). First run seeds the watermark at the source's current cursor, so enabling it ships new logs without backfilling the whole retention window.

The iris controller wires this up thinly (`controller/global_finelog.py`): it supplies its local client, a per-cluster delegation credential, and a state path on its persistent dir; the forwarding/namespacing/watermark logic lives entirely in finelog. Inert until `global_finelog` is configured — no forwarder thread, no egress, byte-identical single-cluster behavior.

## Authenticated ingress (new security surface)
finelog's server (`lib/finelog/rust/src/server/auth.rs`) now gates every RPC with an ordered stack of typed auth **layers** — `cidr` (admit by transport-peer network) and `jwt` (HS256 bearer verified against per-cluster delegation keys) — that is **default-deny** and **always installed**. An unconfigured server falls back to allow-localhost (loopback only), never open. HS256 verification uses vetted RustCrypto primitives (constant-time HMAC, no `ring`), rejects `alg:none`/algorithm-confusion, and tolerates a small `exp` clock-skew leeway. The `/debug/*` admin routes are gated by the same policy. It is configured by a single `FINELOG_AUTH_POLICY` JSON layer list, which the finelog deploy config, GCP bootstrap script, and k8s manifest all plumb (k8s refuses to inline jwt secrets — those must come through a Secret).

Each relaying controller authenticates its pushes with a short-lived HS256 delegation JWT, minted via the same `JwtTokenManager` the control plane uses but signed with a **dedicated** per-cluster key the global store verifies — so the shared store can verify a cluster's tokens without gaining the power to mint control-plane tokens.

Because auth is always installed, the three committed local finelog configs (`marin`, `marin-dev`, `cw-us-east-02a`) gain an explicit `cidr` layer (RFC1918 + loopback) so intra-cluster/pod ingest keeps working — reachable from the private network, never the public internet.

## Cost
Relaying every cluster's logs to one region is deliberate cross-region **egress** — the trade for a uniform, always-available, cross-cluster-queryable log surface. finelog already batches/compresses and tiers cold segments to object storage, which bounds steady-state cost.

The read side (surfacing a federated job's logs through the parent: `job_id`→peer-namespace remap + read-through auth) is deferred to a follow-up (#6862).

Part of #6718 — PR4b of the federation rollout (global finelog + relay + authed ingress).
@rjpower rjpower added the agent-generated Created by automation/agent label Jul 2, 2026
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @rjpower's task in 4m 44s —— View job


Code review

No issues found. Checked for bugs and CLAUDE.md/AGENTS.md compliance.

I ran a multi-agent correctness pass over the diff (2 bug/security agents, 1 CLAUDE.md-compliance agent, 1 test-quality agent) and validated each candidate finding against the real source. The security-critical surfaces hold up:

  • JWT ingress (lib/finelog/rust/src/server/auth.rs) — is_hs256() gates on alg == HS256 before any verification, so alg:none/algorithm-confusion is rejected; exp is enforced with bounded leeway; signature check is constant-time (Mac::verify_slice). The Rust HMAC-over-string-bytes keying matches PyJWT's HS256, and the pinned cross-language interop token proves the contract.
  • Default-denyAuthPolicy::admits terminates in false; empty policy lists are rejected at parse; the allow-localhost fallback is installed only when no policy is configured, so it's unreachable once a policy exists. CIDR matches the transport peer only (ConnectInfo<SocketAddr>), never a forwarded header, and prefix_matches handles non-byte-aligned masks and v4-mapped-v6.
  • Forwarder watermark (lib/finelog/src/finelog/forwarder.py) — cursor is exclusive (seq > cursor), advanced to response.cursor only after an ack'd PushLogs that has durably persisted, so a failure re-forwards (duplicates, never drops) — matching the stated at-least-once design. Corrupt state raises rather than silently reseeding past unforwarded logs.
  • k8s secret refusalassert_inlineable_auth rejects any JwtAuthLayer before rendering the plaintext manifest, so only cidr-only policies are inlined.
  • CompilationSocketAddr imported at both bind sites; ServerConfig correctly drops Copy now that it holds Arc<AuthPolicy> and is taken by &; the debug router is gated by the same policy via from_fn_with_state.

Tests are behavior-focused (real embedded servers, observable outcomes, optional-dep skip guards) with no slop/tautology/call-count assertions. The PR description is compliant — it leads with what changed, and its section headings are informative for a ~1.9k-line PR rather than template scaffold.

One minor, non-blocking note (not flagged as a violation): lib/iris/src/iris/cluster/controller/global_finelog.py uses raw time.time() while lib/iris/AGENTS.md nudges toward rigging.timing. It mirrors the sibling controller/auth.py JwtTokenManager, which uses the same raw-time pattern for token expiry, so it follows local precedent.
branch

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc2cf8d85e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread lib/iris/src/iris/cluster/controller/controller.py
Comment on lines +130 to +131
source=_ALL_KEYS_PREFIX,
match_scope=logging_pb2.MATCH_SCOPE_PREFIX,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid re-forwarding already namespaced relay logs

If the global finelog is also this cluster's configured source log server, or if the source ever contains /c/<cluster> relay keys, tailing every key under / makes the forwarder read its own forwarded rows and push them again as /c/<cluster>/c/<cluster>/... on the next tick. That can create unbounded duplicate/amplified log traffic for the home/global-store cluster; exclude the relay namespace from the source scan or reject source==target wiring.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Fixed in 766c86e: the forwarder now skips source entries already under its key prefix, so a source==target (or otherwise shared) store can't loop /c/x/... → /c/x/c/x/.... The watermark still advances past the skipped rows, and the tick's forwarded count now reports pushed (not fetched) entries. Added a regression test (test_forwarder_skips_already_namespaced_keys_no_amplification) that drives the forwarder against a single store as both source and target and asserts no doubly-namespaced rows appear.

rjpower added 4 commits July 2, 2026 19:42
Replace the two top-level fields (log_server_config for the local finelog +
global_finelog for the relay target) with a single finelog: block —
finelog.config names the local store, finelog.relay (optional) points at the
shared global finelog and turns on the forwarder. Drop the redundant
global_finelog.cluster: the delegation JWT's identity is the cluster's own
name (the finelog verifier attributes identity from its configured key label,
not the token subject), so it was doubly restated.
Guard against re-forwarding rows this forwarder produced: skip source entries
already under the key prefix, so a source==target (or otherwise shared) store
can't loop /c/x/user -> /c/x/c/x/user unboundedly. The watermark still advances
past skipped rows, and the tick's forwarded count now reports pushed entries,
not fetched.

Document that the watermark is intentionally local best-effort state (fsync'd
for same-host restart, not part of any external checkpoint): a host replacement
reseeds at source-max, accepting a bounded gap of un-forwarded logs rather than
coupling the relay to the controller's checkpoint. Observability, not correctness.
Remove the finelog.relay pydantic round-trip and the default-is-None assert
(they test serialization, not behavior), and trim the auth-layer test to its one
useful assertion: the exact FINELOG_AUTH_POLICY JSON the Rust server parses.
Address review findings on the relay path:
- Drain the source within a tick instead of shipping one batch per poll
  interval, so a cluster producing faster than batch_lines/interval can't
  outrun the forwarder until source retention drops un-forwarded rows.
- The self-amplification guard matches the prefix with a trailing "/", so a
  sibling namespace (/c/x-dev) is no longer false-skipped by /c/x.
- Validate that a cluster enabling finelog.relay sets a slash-free top-level
  name (the /c/<name> log namespace) at config load, not as a controller
  crash-loop on the remote host.
Collapse the local finelog config and the relay target into one flat finelog:
block — config / relay_address / delegation_key / static_token. Setting
relay_address turns on the forwarder; empty leaves the log plane single-cluster.

Drop the /c/<cluster> key-rewriting from the relay: the forwarder is now a plain
durable replicator (same keys, source -> target), and cross-cluster namespacing
is deferred — it will be handled on the write side (tasks/controller write the
namespaced key directly) or via a finelog cluster column, rather than rewritten
mid-relay. Removes cluster_namespace_prefix and the forwarder's key_prefix +
self-amplification guard; keeps the backlog drain.
@rjpower rjpower merged commit 05affa4 into main Jul 2, 2026
36 checks passed
@rjpower rjpower deleted the weaver/iris-federation-pr4b-finelog-rel branch July 2, 2026 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant