Skip to content

Conversation

@i3Craig
Copy link

@i3Craig i3Craig commented Dec 26, 2025

Summary

There is an issue in the MQTT consumer where the client will reconnect to the topic after a network blip, but may fail to receive messages from the topic. Using tcpdump, one can see the messages flowing over the network, but the MQTT consumer does not receive / process them. Restarting telegraf is the only way to fix this issue.
This issue can also come up if the server hosting the topic is rebooted.

It appears that the MQTT library used "under the hood" does not like it when an external entity manually calls disconnect and reconnect. Comments in the library suggest that auto-reconnect should be enabled instead so the library can reconnect all by itself. To accommodate this change, a new handler function was needed to handle when the MQTT library reconnects. This function resubscribes to the topics of interest on reconnect, as these are lost during a re-connection.

Testing

To replicate / test this change, I setup a system that had a network connection that toggled between connected and not connected every 1 minute (1 minute of connection, one minute of no connection, and so on forever).

  • The current release of telegraf (version: 1.37.0) would stop processing messages from the topic between 3 and 10 minutes of the test.
  • The updated code (in this pull request) went on for over 6 hours (testing was stopped at the end of the day to shut down the PC).

Checklist

Related issues

resolves #16293, #16035

Misc

This is my first pull request for this repo. Let me know if anything looks off or not to standards and I can get it corrected.

…sult in no messages flowing through the mqtt consumer despite a connection to the topic.
@telegraf-tiger telegraf-tiger bot added fix pr to fix corresponding bug plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Dec 26, 2025
@i3Craig
Copy link
Author

i3Craig commented Dec 26, 2025

It looks like this unit test is failing. I believe this is because the "fakeClient" (mock MQTT client) does not trigger the onConnect handler like the real MQTT client would, so it doesn't cause the subscribe method to get called. Should I update this test to create a more accurate client?

image

@i3Craig
Copy link
Author

i3Craig commented Dec 26, 2025

This might require some rework, as it appears that we don't actually want to subscribe to topics if the session is persistent. The persistent session information is stored in the token, which isn't in the client object. Thus, we would need to either store the token itself or the SessionPersistent flag in the instance itself so we can reference it in the onConnected callback.
image

@i3Craig i3Craig closed this Dec 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix pr to fix corresponding bug plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MQTT consumer plugin disconnects frequently and can not reconnect successfully

1 participant