Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if leader was found in between network requests #302

Closed
wants to merge 1 commit into from

Conversation

masnax
Copy link
Contributor

@masnax masnax commented Jul 5, 2024

We send a request to multiple peers concurrently to ask if they are the leader. They themselves may be the leader, or they may report who they think is the leader. We then try to reach out to whoever they think is the leader.

It is possible that we are are already talking to the leader concurrently with asking a different node who the leader is, so in such cases we don't need to send further network requests and can return early by checking if we have already found the leader in another goroutine.

We send a request to multiple peers concurrently to ask if they are the
leader. They themselves may be the leader, or they may report who they
think is the leader. We then try to reach out to whoever they think is
the leader.

It is possible that we are are already talking to the leader
concurrently with asking a different node who the leader is, so in such
cases we don't need to send further network requests and can return
early by checking if we have already found the leader in another
goroutine.

Signed-off-by: Max Asnaashari <[email protected]>
@cole-miller
Copy link
Contributor

cole-miller commented Jul 5, 2024

We already have some logic along these lines---the investigating goroutines share a parent Context:

ctx, cancel := context.WithCancel(ctx)

Which is passed into connectAttemptOne:

protocol, leader, err := c.connectAttemptOne(ctx, server.Address, log)

And this context is cancelled as soon as we get a connection to the leader:

protocol, ok := <-protocolChan
if !ok {
return nil, ErrNoAvailableLeader
}
cancel()

So this change might be redundant, if I'm not missing something.

@masnax
Copy link
Contributor Author

masnax commented Jul 5, 2024

So this change might be redundant, if I'm not missing something.

Ah, looks like this one is a bug on our end then!

I was doing 2 things incorrectly here:

  • I was counting the calls to the DialFunc instead of the calls that make it through to the other side. With the context, the cancellation won't manifest until much later.
  • microcluster is not correctly passing the context through as it uses tls.DialWithDialer which internally uses context.Background(). Hence why I saw an improvement there, but now checking with dqlite-demo I don't see the same improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants