Commit 6ce0540
fix(broker-lifecycle): reject stale sessions via PID-alive + age checks
isBrokerEndpointReady() only does a 150ms socket ping. If the broker's
underlying codex app-server subprocess is in a bad state — observed
after multi-day uptime — the socket still accepts connections, so the
existing session is trusted and reused, but every task disconnects
mid-turn because the transport subsystem behind the socket is broken.
Two complementary guards added in isSessionStale():
1. PID-alive probe: process.kill(session.pid, 0) detects a crashed
broker whose socket file may linger. Covers the acute crash case.
2. Age-based rotation: compare Date.now() against session.startedAt
(new field, captured when the broker spawns). Default threshold 6h,
overridable via CODEX_COMPANION_BROKER_MAX_AGE_HOURS env var.
Covers the slow-degradation case where neither PID nor socket ping
surface the problem.
Gate added to ensureBrokerSession() before trusting the socket ping:
existing session must (a) exist, (b) not be stale, (c) pass the socket
ping. If any check fails, the existing session is torn down and a
fresh broker is spawned — the same path already used for missing or
unreachable brokers.
Observed in the wild: broker process with 3d+11h uptime caused 100%
task disconnect rate for every /codex:rescue invocation. Restarting
the broker (deleting broker.json and letting the next task spawn
fresh) fully restored task reliability until the next degradation.1 parent 807e03a commit 6ce0540
1 file changed
Lines changed: 56 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
113 | 162 | | |
114 | 163 | | |
115 | | - | |
| 164 | + | |
116 | 165 | | |
117 | 166 | | |
118 | 167 | | |
| |||
123 | 172 | | |
124 | 173 | | |
125 | 174 | | |
126 | | - | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
127 | 179 | | |
128 | 180 | | |
129 | 181 | | |
| |||
164 | 216 | | |
165 | 217 | | |
166 | 218 | | |
167 | | - | |
| 219 | + | |
| 220 | + | |
168 | 221 | | |
169 | 222 | | |
170 | 223 | | |
| |||
0 commit comments