You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix blue-green migration might be stuck due to an existing reconnection (#406)
Fixes#405
### Motivation
After triggering a blue-green migration, the socket will be disconnected
and then schedule a reconnection to the blue cluster. However, the blue
cluster could never respond with a response for Producer or Subscribe
commands. Take producer as example, it means `connectionOpened` will not
complete and `reconnectionPending_` will not become false.
Then, after receiving a `CommandProducerClose` command from the blue
cluster, a new reconnection will be scheduled to the green cluster but
it will be skipped because `reconnectionPending_` is true, which means
the previous `connectionOpened` future is not completed until the 30s
timeout is reached.
```
2024-02-26 06:09:30.251 INFO [139737465607744] HandlerBase:101 | [persistent://public/unload-test/topic-1708927732, sub, 0] Ignoring reconnection attempt since there's already a pending reconnection
2024-02-26 06:10:00.035 WARN [139737859880512] ProducerImpl:291 | [persistent://public/unload-test/topic-1708927732, cluster-a-0-0] Failed to reconnect producer: TimeOut
```
### Modifications
When receiving the `TOPIC_MIGRATED` command, cancel the pending
`Producer` and `Subscribe` commands so that `connectionOpened` will fail
with a retryable error. In the next time of reconnection, the green
cluster will be connected.
Fix the `ExtensibleLoadManagerTest` with a more strict timeout check.
After this change, it will pass in about 3 seconds locally, while in CI
even if it passed, it takes about 70 seconds before.
Besides, fix the possible crash on macOS when closing the client, see
#405 (comment)
0 commit comments