Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EncryptionRequestHandler supports encryption requests distribution. #115

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bruno-roustant
Copy link
Contributor

The client can call EncryptionRequestHandler and provide the additional parameter "distrib" = "true". In this case, this handler will distribute the encryption request to all the leader replicas of all the shards of the collection, ensuring they all encrypt their index shard. The response "encryptionState" will be "complete" only when all the shards return "complete". So the caller may repeatedly send the same encryption request with "distrib" = "true" to know when the whole collection encryption is complete. In addition, the caller can set the "timeAllowed" parameter to define a timeout for the request distribution, in milliseconds. This timeout is for the distribution itself, not the encryption process.

EncryptionDirectoryFactory.getFactory(req.getCore());

if (req.getParams().getBool(DISTRIB, false)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if we could default "distrib" based on whether the request was addressed to the collection or not. Maybe that metadata is there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this metadata. The collection name is present in the CloudSolrClient.request call, but not passed to the inner LBSolrClient.request call.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SolrQueryRequest.getHttpSolrCall().getCoreOrColName() -- but never mind my thought on this. Since SearchHandler/CloudReplicaSource doesn't toggle its scope based on looking at this, we shouldn't either. Consistency.

}
try (SolrClientHolder solrClient = getHttpSolrClient(req)) {
ModifiableSolrParams params = createDistributedRequestParams(req, rsp, keyId);
for (Slice slice : docCollection.getActiveSlices()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm; sends in series vs in parallel. We could chat about that tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I modified to use the ExecutorService to send the request in parallel.

}
if (log.isInfoEnabled()) {
long timeMs = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTimeNs);
log.info("Responding encryption distributed state={} success={} for keyId={} collection={} timeMs={}",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI collection is redundant with MDC.
And you might want to put state and/or success into rsp.addToLog where it may be seen in structured logs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How MDC knows the collection if it is not passed to the inner LBSolrClient.request by the CloudSolrClient?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solr sets MDC at key spots; do a find-usages of org.apache.solr.logging.MDCLoggingContext. SolrCore.open() is a subtle one. CloudSolrClient/LBSolrClient isn't pertinent at all.

private NamedList<Object> sendEncryptionRequest(Replica replica, ModifiableSolrParams params, Http2SolrClient httpSolrClient)
throws SolrServerException, IOException {
GenericSolrRequest request = new GenericSolrRequest(SolrRequest.METHOD.POST, getPath(), params);
request.setBasePath(replica.getCoreUrl());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setBasePath is now gone in Solr main. Instead, you should use httpSolrClient.requestWithBaseUrl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not available in Solr 9.6, the Solr version used here. I propose to let this code for now, and replace it when I upgrade to 9.8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, Solr 9.8.0 was released during this long lasting PR, so I updated.

Copy link

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll find use of the Solr TimeOut class to be helpful here.

});
return encryptionStatus;
}

private EncryptionStatus encryptDistrib(SolrParams params) throws SolrServerException, IOException {
params = SolrParams.wrapDefaults(params, new ModifiableSolrParams().set(DISTRIB, "true"));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :-)
BTW set's 2nd arg value needn't be a string; it's overloaded to take the primitive boolean.

String collectionName = null;
State collectionState = null;
long timeAllowedMs = req.getParams().getLong(TIME_ALLOWED, 0);
long maxTimeNs = timeAllowedMs <= 0 ? Long.MAX_VALUE : startTimeNs + timeAllowedMs;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are adding milliseconds and nanoseconds. Maybe convert timeAllowsMs on fetch so that you don't have any millisecond vars

try {
state = response.get();
} catch (ExecutionException e) {
collectionState = State.ERROR;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should log.error(e)

} catch (InterruptedException e) {
collectionState = State.INTERRUPTED;
}
success = collectionState == null || collectionState.isSuccess();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is null a success?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null only happens if there is no distribution because there are no slices. I don't know if it can happen actually, but the request to encrypt all shards is complete.

Comment on lines 432 to 434
ModifiableSolrParams params = new ModifiableSolrParams();
params.set(PARAM_KEY_ID, keyId == null ? NO_KEY_ID : keyId);
return params;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this can be exactly one line since set returns itself

return params;
}

boolean isTimeout(long maxTimeNs) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method named confused me a little. If you name it "hasTimedOut", it'd be a bit clearer. Also "maxTimeNs" reads looks like it's probably a duration, but it's an instant in time. "timeoutAtNs" would be clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following your advice, I replaced the code by using TimeOut.
Doing so I discovered a long overflow bug in TimeOut if we try to create a TimeOut with interval = Long.MAX_VALUE, or any large interval which overflows when added to TimeSource.getTimeNs(). I'll create an issue.

}

protected String getPath() {
return "/admin/encrypt";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of hard-coding this, should probably use the path of the current request (as we are calling ourselves), which is request.getPath()

Comment on lines 424 to 428
CloudSolrClient cloudSolrClient = coreContainer.getSolrClientCache().getCloudSolrClient(coreContainer.getZkController().getZkClient().getZkServerAddress());
if (cloudSolrClient instanceof CloudHttp2SolrClient) {
return new SolrClientHolder(((CloudHttp2SolrClient) cloudSolrClient).getHttpClient(), false);
}
return new SolrClientHolder(new Http2SolrClient.Builder().build(), true);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed, nor is SolrClientHolder. Just use an existing client.
You can use coreContainer.getUpdateShardHandler().getUpdateOnlyHttpClient().
It's mildly debatable if it's the perfect client since an encryption request is sorta kinda an "update" but it's really not a big deal.
You shouldn't create new clients (which was your fallback here) as it's tricky to do it correctly inside Solr as there are some instrumentation & authentication matters to attend to that UpdateShardHandler works out for you.

@bruno-roustant
Copy link
Contributor Author

Thank you for your thorough review David. I have updated to integrate your remarks.

protected String getPath() {
return "/admin/encrypt";
GenericSolrRequest distributedRequest = new GenericSolrRequest(SolrRequest.METHOD.POST, requestPath, params);
return httpSolrClient.requestWithBaseUrl(replica.getCoreUrl(), replica.getCollection(), distributedRequest);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you use getCoreUrl, it's complete (solr (aka "base url") + core). Passing collection here would be a problem; concatenating that after the core url which doesn't make sense. A test had better fail here.

CommitUpdateCommand commitCmd = new CommitUpdateCommand(req, true);
// Trigger EncryptionMergePolicy.findForcedMerges() to re-encrypt
// each segment which is not encrypted with the latest active key id.
commitCmd.maxOptimizeSegments = Integer.MAX_VALUE;
req.getCore().getUpdateHandler().commit(commitCmd);
log.info("Successfully triggered index encryption with commit in {}", elapsedTime(startTimeNs));
log.info("Successfully triggered index encryption with commit in {} ms", elapsedTimeMs(startTimeNs));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks confusing / error-prone. Seeing elapsedTimeMs take a parameter with "Ns" suffix smells of a bug. I wonder if we should embrace Instant where we can and thus avoid time units in many places?

Comment on lines +365 to +366
// Use the update-only http client, considering encryption is an update as we indeed create new Lucene segments.
Http2SolrClient solrClient = req.getCoreContainer().getUpdateShardHandler().getUpdateOnlyHttpClient();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you could move this to inside sendEncryptionRequestWithRetry, thus one fewer param as well. It doesn't matter but sendEncryptionRequestWithRetry has lots of params so maybe this move would be nicer. Up to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EncryptionRequestHandler support for collection with multiple shards
2 participants