Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReconnectingClient prints "Lost connection" for old nodes when another one is added - while requests go through just fine #119

Open
alexeyOnGitHub opened this issue Feb 12, 2019 · 4 comments

Comments

@alexeyOnGitHub
Copy link
Contributor

I assume this is just a logging issue rather than a bug affecting client behavior.
I see this regularly -

say, I boot up 2 Memcached nodes: A and B. they are used by several app nodes (3 in my latest fault+perf test).
the total traffic sent to Memcached from these app nodes is 3000 RPS.
then I boot up one more Memcached node (C) and bring it into Load Balancer.
we use 3 seconds Dns refresh period in Memcached client setting, so the client recognizes that there are changes in the Memcached nodes list pretty quickly.

I see "Successfully connected" and "client connected: BinaryMemcacheClient(com.spotify.folsom.ketama.SrvKetamaClient" in logs, so that is all good.

but -
even though Memcached client keeps working fine (I can see requests in memcached tracking dashboards, no timeouts in the app, all good), I see these worrying messages in logs about 60 seconds after the new node is picked by the client. these messages refer to original Memcached nodes A and B, but not C.

INFO [2019-02-12 14:33:10,641] [folsom-default-scheduled-executor] [com.spotify.folsom.reconnect.ReconnectingClient] [] - Lost connection to A:11211
INFO [2019-02-12 14:33:10,641] [folsom-default-scheduled-executor] [com.spotify.folsom.reconnect.ReconnectingClient] [] - Lost connection to A:11211
INFO [2019-02-12 14:33:10,643] [folsom-default-scheduled-executor] [com.spotify.folsom.reconnect.ReconnectingClient] [] - Lost connection to B:11211
INFO [2019-02-12 14:33:10,643] [folsom-default-scheduled-executor] [com.spotify.folsom.reconnect.ReconnectingClient] [] - Lost connection to B:11211

we use 2 connections per host, this is why you see two "lost connection" lines per node.

in other words, ReconnectingClient prints "Lost connection" for old nodes after its nodes list was refreshed and another node was picked up.

I am guessing there is no problem with the current client connections and those messages are just some delayed cleanup messages from old connections. if that is the case, it would be nice to change the logging to be more explicit. "lost connection" looks scary

@spkrka
Copy link
Member

spkrka commented Feb 13, 2019

Thanks for the report! It sounds it's possible that the old clients are still trying to reconnect. I'll investigate it further.

@spkrka
Copy link
Member

spkrka commented Feb 13, 2019

Actually... looks like this is intentional. When reconnecting, there's a 60 second shutdown delay by default: https://github.com/spotify/folsom/blob/master/folsom/src/main/java/com/spotify/folsom/ketama/SrvKetamaClient.java#L206

You can reconfigure the shutdown delay if you want.

@alexeyOnGitHub
Copy link
Contributor Author

right, my point is specifically about the log message.
I was thinking maybe the log message could be changed to make it explicit that the connection was not suddenly lost, but it was due to the planned shutdown. e.g. check if (shutdown) ... and then print different messages depending on whether it is a sudden connection loss vs. planned shutdown.

the message is somewhat confusing and scary otherwise.

@spkrka
Copy link
Member

spkrka commented Feb 13, 2019

I'll look into how much work it would be to introduce pending shutdowns and silence the logs for those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants