Skip to content

Fix multi-topics-consumer new topic listeners stuck in paused state #481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 22, 2025

Conversation

oversearch
Copy link
Contributor

@oversearch oversearch commented Apr 10, 2025

This was introduced in v3.7.0 via #447, which contained a change that attempted to fix an issue where a pre-mature ack of a message before a multi-topic subscriber was ready could have caused a crash. To fix the original bug, all multi-topic subscriptions are started with their message listener paused. They later get un-paused once all topics are successfully subscribed and connected. However, on a regex subscription when new topics are discovered, they also start in a paused state, and there's no mechanism to resume them. Hence, they get stuck, and no messages for new topics will be processed.

This change adds a call to resume any new listeners after new topics are added. The fix has been deployed to production in my company's infrastructure and has been confirmed working for a few weeks.

Verifying this change

I could not get the tests to pass with a bare checkout of master no matter what I did. The first few tests were extremely flaky, so I commented them out. The OAuth tests require sudo and wanted to modify my root certificates, which I'm not about to allow, so I commented that out. And the broker tests always segfault in the BasicEndToEndTest.testUnAckedMessageTrackerEnabledIndividualAck test. I gave up trying to mess with the tests - I've confirmed the fix works in my environment and our tests pass, so I was satisfied. Just trying to upstream this bugfix...

Documentation

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    Simple regression bug fix

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

This was introduced in v3.7.0 via apache#477. which contained a change that attempted to fix an issue where a pre-mature ack of a message before a multi-topic subscriber was ready could have caused a crash.  To fix the original bug, all multi-topic subscriptions are started with their message listener paused. They later get un-paused once all topics are successfully subscribed and connected.  However, on a regex subscription when new topics are discovered, they also start in a paused state, and there's no mechanism to resume them.  Hence, they get stuck, and no messages for new topics will be processed.

This change adds a call to resume any new listeners after new topics are added.  The fix has been deployed to production in my company's infrastructure and has been confirmed working for a few weeks.
@BewareMyPower BewareMyPower added this to the 3.8.0 milestone Apr 21, 2025
@BewareMyPower BewareMyPower added the bug Something isn't working label Apr 21, 2025
@BewareMyPower
Copy link
Contributor

Sorry, did you mention a wrong PR? #477 is drafted and not included in 3.7.0

@BewareMyPower
Copy link
Contributor

I see it should be a regression caused by #447.

@BewareMyPower BewareMyPower merged commit 0a9b7d9 into apache:main Apr 22, 2025
12 of 13 checks passed
@oversearch
Copy link
Contributor Author

I see it should be a regression caused by #447.

Ah yes, my apologies, I typo'd that PR number by mistake. Thank you for merging this fix!

BewareMyPower pushed a commit that referenced this pull request Apr 28, 2025
…481)

This was introduced in v3.7.0 via #447, which contained a change that attempted to fix an issue where a pre-mature ack of a message before a multi-topic subscriber was ready could have caused a crash.  To fix the original bug, all multi-topic subscriptions are started with their message listener paused. They later get un-paused once all topics are successfully subscribed and connected.  However, on a regex subscription when new topics are discovered, they also start in a paused state, and there's no mechanism to resume them.  Hence, they get stuck, and no messages for new topics will be processed.

This change adds a call to resume any new listeners after new topics are added.

(cherry picked from commit 0a9b7d9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants