-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add health check to route queries to healthy cluster and router jmx counter updates #24449
base: master
Are you sure you want to change the base?
Add health check to route queries to healthy cluster and router jmx counter updates #24449
Conversation
|
be503f7
to
e4de4ba
Compare
Will fix CLA and squash commits upon review |
e4de4ba
to
0d7cafe
Compare
@auden Woolfson - do we plan on including counter jmx metrics as part of this PR? is that intentional? If so, we need the PR title and description updated. If not, need a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall approach looks good. Please squash and split into logically cohesive commits
public void startConfigReloadTask() | ||
{ | ||
File routerConfigFile = new File(routerConfig.getConfigFile()); | ||
//ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
{ | ||
this.routerConfig = config; | ||
this.scheduledExecutorService = scheduledExecutorService; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just instantiate the scheduledExecutorService
here ? Why inject it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, @saravanan19 is there a reason for this?
} | ||
lastConfigUpdate.set(newConfigUpdateTime); | ||
} | ||
}, 0L, (long) 5, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(long) 5
-> 5L
} | ||
|
||
@PostConstruct | ||
public void startConfigReloadTask() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using a FileWatcher instead.
1/ Updates are lower latency
2/ Saves on having an extra bg thread
|
||
public RemoteState(HttpClient httpClient, URI remoteUri) | ||
private Boolean isHealthy = false; | ||
private long lastHealthyResponseTime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of long, prefer using Instant
to track an accurate timestamp since last response
} | ||
|
||
scheduler.setCandidates(healthyClusterURIs); | ||
if (schedulerType == WEIGHTED_RANDOM_CHOICE || schedulerType == WEIGHTED_ROUND_ROBIN) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an unrelated change to health check. Can you make a new PR for this change instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jp-sivaprasad, can you please add this to #24580?
binder.bind(ClusterManager.class).in(Scopes.SINGLETON); | ||
binder.bind(RemoteInfoFactory.class).in(Scopes.SINGLETON); | ||
|
||
bindHttpClient(binder, QUERY_TRACKER, ForQueryInfoTracker.class, IDLE_TIMEOUT_SECOND, REQUEST_TIMEOUT_SECOND); | ||
bindHttpClient(binder, QUERY_TRACKER, ForClusterInfoTracker.class, IDLE_TIMEOUT_SECOND, REQUEST_TIMEOUT_SECOND); | ||
|
||
//Determine the NodeVersion | ||
NodeVersion nodeVersion = new NodeVersion(serverConfig.getPrestoVersion()); | ||
binder.bind(NodeVersion.class).toInstance(nodeVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used ?
@@ -17,6 +17,11 @@ | |||
</properties> | |||
|
|||
<dependencies> | |||
<dependency> | |||
<groupId>com.facebook.presto</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is only TestingPrestoServer
used from presto-main
? Or are there other types ? This can be a test-only dependency ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may be able to make this a test only dep for this PR, but there are other parts of router that we are forward fitting that will rely on this dependency in the main module. It might be beneficial to just leave this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of pulling in a huge dependency like presto-main into presto-router. What other PRs are bringing this in ? Let's see if we can avoid this by building good mocks for the Presto server. I think we may get by just having a few API endpoint's mocked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the authentication piece uses this. Other might as well. We can switch this for this PR but I can't guarantee that we will be able to leave it like this. We can cross that bridge when we get to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually on second thought we might still need this. We are binding multiple classes (ServerConfig
, WebUiResource
, PluginManagerConfig
) in the RouterModule
from presto main.
@Test(enabled = false) | ||
public void testHealthChecks() | ||
{ | ||
prestoServers.get(0).stopResponding(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test where a server becomes unresponsive (i.e unhealthy), is removed out of rotation, and then becomes responsive again and is added back to the rotation ?
} | ||
|
||
@PostConstruct | ||
public void startConfigReloadTask() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for this scenario - the file config gets updated, old servers are removed. New ones get added, existing ones stay as-is
…/RemoteState.java Co-authored-by: Anant Aneja <[email protected]>
Description
Add coodinator health checks to presto router to ensure queries are sent to active/healthy clusters. Part of presto router forward fit. Also includes code implemented to patch CVEs.