Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce RemoteIndexClient #2548

Conversation

owenhalpert
Copy link
Contributor

@owenhalpert owenhalpert commented Feb 20, 2025

Description

First PR for #2518

This PR introduces the initial implementation of the Remote Index client, which is awaiting implementation on the following:

Related Issues

#2391 Meta Issue
#2393 HLD by @jed326
#2518 My LLD

Check List

  • New functionality includes testing.
    - [ ] New functionality has been documented.
    - [ ] API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
    - [ ] Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@owenhalpert owenhalpert force-pushed the owen-remote-vector-dev branch from 57290c1 to 23faf72 Compare February 20, 2025 21:14
Comment on lines 46 to 47
public static final long BASE_DELAY_MS = 100;
public static final long INITIAL_DELAY_MS = 10;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are currently set low for testing, though we haven't thought on strong defaults for these anyway.

@owenhalpert owenhalpert force-pushed the owen-remote-vector-dev branch from 23faf72 to 9bb2d92 Compare February 20, 2025 22:54
long pollInterval = ((TimeValue) (KNNSettings.state().getSettingValue(KNNSettings.KNN_REMOTE_BUILD_SERVICE_POLL_INTERVAL)))
.getMillis();

Thread.sleep(INITIAL_DELAY_MS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look into this some more, but I'm not sure this is the best implementation. Off the top of my head I am thinking of some sort of job scheduler semantic with a latch and we can do latch.await() here or something like that. Also it seems a little dangerous to do this in a Singleton class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, shouldn't INITIAL_DELAY_MS just be pollInterval? or maybe some function of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the two as related. I was thinking of the INITIAL_DELAY_MS as a buffer to prevent the client from sending any status requests when we know for sure that the build is still in progress. For example finding a lower bound t on the GPU build or calculating this based on workload size, then waiting 0.5t so to not send any unnecessary status checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking on it some more I think Thread.sleep is fine, but let's make sure we have tests for it across concurrent jobs.

For the other point, the polling interval itself should also be a function of t else it's also going to send unnecessary status checks.

* @param jobId to check
* @return HttpExecuteResponse for the status request
*/
public String getBuildStatus(String jobId) throws IOException, URISyntaxException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a lot of these methods can be private or protected

@owenhalpert owenhalpert force-pushed the owen-remote-vector-dev branch from 9bb2d92 to 907c6be Compare February 21, 2025 16:56
Copy link
Contributor

@jed326 jed326 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @owenhalpert!

At a high level I think we should separate out the client changes into the following individual PRs:

  1. Client skeleton + build request, including polling logic + tests, failure handling
  2. Polling + failure handling
  3. Metrics

I think there is a lot more testing that needs to be written for all of these and I think it will be much easier to focus on each component individually.

long pollInterval = ((TimeValue) (KNNSettings.state().getSettingValue(KNNSettings.KNN_REMOTE_BUILD_SERVICE_POLL_INTERVAL)))
.getMillis();

Thread.sleep(INITIAL_DELAY_MS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking on it some more I think Thread.sleep is fine, but let's make sure we have tests for it across concurrent jobs.

For the other point, the polling interval itself should also be a function of t else it's also going to send unnecessary status checks.

Add RemoteIndexClient initial implementation, its accompanying dependencies, and Build Request, Retry Strategy, and test files

Signed-off-by: owenhalpert <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants