Skip to content

[iris] Preserve adopted task host-port reservations across worker restart#6723

Merged
rjpower merged 1 commit into
mainfrom
agent/20260626-fix-6721
Jun 26, 2026
Merged

[iris] Preserve adopted task host-port reservations across worker restart#6723
rjpower merged 1 commit into
mainfrom
agent/20260626-fix-6721

Conversation

@claude

@claude claude Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

On worker restart an adopted container kept running but its host-port reservations were lost: TaskAttempt.adopt() rebuilt ports={} and never re-marked the PortAllocator, which started empty. The worker could then re-hand those in-use ports (e.g. 30000/30001) to the next scheduled task, causing a bind clash.

The port set had no substrate footprint, so it could not be recovered. This change gives it one and restores it on adopt:

  • Stamp the allocated host ports as an iris.ports Docker label at container create (ContainerConfig.ports).
  • Recover the ports into DiscoveredContainer.ports during discover_containers().
  • Add PortAllocator.reserve() to re-mark a known port set as taken.
  • Have adopt() restore attempt.ports and reserve() them before any new work is scheduled.

Independent of multi-backend, and a prerequisite for the per-cluster agent recoverable-cache port recovery in #6718.

Fixes #6721

…tart

On worker restart, an adopted container's host ports were dropped: adopt()
rebuilt ports={} and never re-marked the PortAllocator, so the worker could
re-hand those ports to a new task and cause a bind clash.

Stamp the allocated ports as an iris.ports Docker label, recover them into
DiscoveredContainer during discovery, and have adopt() restore attempt.ports
and reserve() them on the allocator before scheduling new work.

Fixes #6721
@claude claude Bot added the agent-generated Created by automation/agent label Jun 26, 2026
@rjpower rjpower merged commit dae0176 into main Jun 26, 2026
34 checks passed
@rjpower rjpower deleted the agent/20260626-fix-6721 branch June 26, 2026 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[iris] Worker restart double-allocates ports: adopt() drops port reservations

1 participant