[iris] Move per-backend dead-worker pruning into the backend store#6795
Merged
Conversation
The controller's prune loop reached into every backend's health tracker and worker-attributes projection to delete stale DEAD workers. Make each backend garbage-collect its own dead workers instead: add prune_dead_workers to the BackendWorkerStore protocol (implemented on DbBackendWorkerStore, which already holds db, health, and worker_attrs) and to the TaskBackend protocol, delegating from RpcTaskBackend and no-opping on the Kubernetes backend (it tracks no Iris workers). prune_old_data now takes the backends collection and sums each backend's own GC, keeping only cross-backend concerns (terminal jobs, orphan slices, expired endpoints) in the controller. This continues the BackendWorkerStore ownership transfer: the controller becomes a thin router and each backend owns its workers, attributes, and liveness. The worker prune still runs on the controller's background prune thread with the same cutoff, one-delete-per-transaction cadence, count, and worker_pruned audit event; the replay golden for prune_old_data is unchanged.
Contributor
|
Claude finished @rjpower's task in 2m 14s —— View job Code review
No issues found. Checked for bugs and CLAUDE.md/AGENTS.md compliance. What I verified:
The PR description is also well-formed — it leads with what the change does and the design note is informative rather than boilerplate; no "Testing" section or template scaffold. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The controller's background prune loop reached into every backend's health tracker and worker-attributes projection to delete stale DEAD workers. Make each backend garbage-collect its own dead workers instead.
prune_dead_workersis added to theBackendWorkerStoreprotocol — implemented onDbBackendWorkerStore, which already holds thedb,healthtracker, andworker_attrsit needs — and to theTaskBackendprotocol, delegated byRpcTaskBackendand a no-op on the Kubernetes backend (it tracks no Iris workers).prune_old_datanow takes the backends collection and sums each backend's own GC, so the controller keeps only the cross-backend prune concerns: terminal jobs, orphan slices, and expired endpoints.This continues the
BackendWorkerStoreownership transfer (P3): the controller moves toward a thin router while each backend owns its workers, attributes, and liveness. The worker prune still runs on the controller's background prune thread — it touches only worker rows, attributes, and tracker entries, never the autoscaler — and preserves the cutoff semantics, the one-delete-per-transaction-plus-pausecadence, thePruneResult.workers_deletedcount, and theworker_prunedaudit event. Theprune_old_datareplay golden is unchanged.Design note:
prune_old_datatakes the backends collection (self._backends.values()) rather than a list of stores, since the controller holds backends and each backend already encapsulates its store.Part of #6718.