-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EncryptionRequestHandler supports encryption requests distribution. #115
base: main
Are you sure you want to change the base?
Conversation
EncryptionDirectoryFactory.getFactory(req.getCore()); | ||
|
||
if (req.getParams().getBool(DISTRIB, false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice if we could default "distrib" based on whether the request was addressed to the collection or not. Maybe that metadata is there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this metadata. The collection name is present in the CloudSolrClient.request call, but not passed to the inner LBSolrClient.request call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SolrQueryRequest.getHttpSolrCall().getCoreOrColName()
-- but never mind my thought on this. Since SearchHandler/CloudReplicaSource doesn't toggle its scope based on looking at this, we shouldn't either. Consistency.
} | ||
try (SolrClientHolder solrClient = getHttpSolrClient(req)) { | ||
ModifiableSolrParams params = createDistributedRequestParams(req, rsp, keyId); | ||
for (Slice slice : docCollection.getActiveSlices()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm; sends in series vs in parallel. We could chat about that tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I modified to use the ExecutorService to send the request in parallel.
} | ||
if (log.isInfoEnabled()) { | ||
long timeMs = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTimeNs); | ||
log.info("Responding encryption distributed state={} success={} for keyId={} collection={} timeMs={}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI collection is redundant with MDC.
And you might want to put state and/or success into rsp.addToLog
where it may be seen in structured logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How MDC knows the collection if it is not passed to the inner LBSolrClient.request by the CloudSolrClient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solr sets MDC at key spots; do a find-usages of org.apache.solr.logging.MDCLoggingContext
. SolrCore.open()
is a subtle one. CloudSolrClient/LBSolrClient isn't pertinent at all.
encryption/src/main/java/org/apache/solr/encryption/EncryptionRequestHandler.java
Show resolved
Hide resolved
encryption/src/main/java/org/apache/solr/encryption/EncryptionRequestHandler.java
Outdated
Show resolved
Hide resolved
private NamedList<Object> sendEncryptionRequest(Replica replica, ModifiableSolrParams params, Http2SolrClient httpSolrClient) | ||
throws SolrServerException, IOException { | ||
GenericSolrRequest request = new GenericSolrRequest(SolrRequest.METHOD.POST, getPath(), params); | ||
request.setBasePath(replica.getCoreUrl()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setBasePath is now gone in Solr main. Instead, you should use httpSolrClient.requestWithBaseUrl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not available in Solr 9.6, the Solr version used here. I propose to let this code for now, and replace it when I upgrade to 9.8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, Solr 9.8.0 was released during this long lasting PR, so I updated.
encryption/src/test/java/org/apache/solr/encryption/EncryptionTestUtil.java
Outdated
Show resolved
Hide resolved
encryption/src/test/java/org/apache/solr/encryption/EncryptionTestUtil.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll find use of the Solr TimeOut class to be helpful here.
}); | ||
return encryptionStatus; | ||
} | ||
|
||
private EncryptionStatus encryptDistrib(SolrParams params) throws SolrServerException, IOException { | ||
params = SolrParams.wrapDefaults(params, new ModifiableSolrParams().set(DISTRIB, "true")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :-)
BTW set's 2nd arg value needn't be a string; it's overloaded to take the primitive boolean.
String collectionName = null; | ||
State collectionState = null; | ||
long timeAllowedMs = req.getParams().getLong(TIME_ALLOWED, 0); | ||
long maxTimeNs = timeAllowedMs <= 0 ? Long.MAX_VALUE : startTimeNs + timeAllowedMs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are adding milliseconds and nanoseconds. Maybe convert timeAllowsMs on fetch so that you don't have any millisecond vars
try { | ||
state = response.get(); | ||
} catch (ExecutionException e) { | ||
collectionState = State.ERROR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should log.error(e)
} catch (InterruptedException e) { | ||
collectionState = State.INTERRUPTED; | ||
} | ||
success = collectionState == null || collectionState.isSuccess(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is null a success?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null only happens if there is no distribution because there are no slices. I don't know if it can happen actually, but the request to encrypt all shards is complete.
ModifiableSolrParams params = new ModifiableSolrParams(); | ||
params.set(PARAM_KEY_ID, keyId == null ? NO_KEY_ID : keyId); | ||
return params; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: this can be exactly one line since set
returns itself
return params; | ||
} | ||
|
||
boolean isTimeout(long maxTimeNs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method named confused me a little. If you name it "hasTimedOut", it'd be a bit clearer. Also "maxTimeNs" reads looks like it's probably a duration, but it's an instant in time. "timeoutAtNs" would be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following your advice, I replaced the code by using TimeOut.
Doing so I discovered a long overflow bug in TimeOut if we try to create a TimeOut with interval = Long.MAX_VALUE, or any large interval which overflows when added to TimeSource.getTimeNs(). I'll create an issue.
} | ||
|
||
protected String getPath() { | ||
return "/admin/encrypt"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of hard-coding this, should probably use the path of the current request (as we are calling ourselves), which is request.getPath()
CloudSolrClient cloudSolrClient = coreContainer.getSolrClientCache().getCloudSolrClient(coreContainer.getZkController().getZkClient().getZkServerAddress()); | ||
if (cloudSolrClient instanceof CloudHttp2SolrClient) { | ||
return new SolrClientHolder(((CloudHttp2SolrClient) cloudSolrClient).getHttpClient(), false); | ||
} | ||
return new SolrClientHolder(new Http2SolrClient.Builder().build(), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed, nor is SolrClientHolder. Just use an existing client.
You can use coreContainer.getUpdateShardHandler().getUpdateOnlyHttpClient()
.
It's mildly debatable if it's the perfect client since an encryption request is sorta kinda an "update" but it's really not a big deal.
You shouldn't create new clients (which was your fallback here) as it's tricky to do it correctly inside Solr as there are some instrumentation & authentication matters to attend to that UpdateShardHandler works out for you.
…8. Fix EncryptionDirectoryTest.
Thank you for your thorough review David. I have updated to integrate your remarks. |
protected String getPath() { | ||
return "/admin/encrypt"; | ||
GenericSolrRequest distributedRequest = new GenericSolrRequest(SolrRequest.METHOD.POST, requestPath, params); | ||
return httpSolrClient.requestWithBaseUrl(replica.getCoreUrl(), replica.getCollection(), distributedRequest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use getCoreUrl, it's complete (solr (aka "base url") + core). Passing collection here would be a problem; concatenating that after the core url which doesn't make sense. A test had better fail here.
CommitUpdateCommand commitCmd = new CommitUpdateCommand(req, true); | ||
// Trigger EncryptionMergePolicy.findForcedMerges() to re-encrypt | ||
// each segment which is not encrypted with the latest active key id. | ||
commitCmd.maxOptimizeSegments = Integer.MAX_VALUE; | ||
req.getCore().getUpdateHandler().commit(commitCmd); | ||
log.info("Successfully triggered index encryption with commit in {}", elapsedTime(startTimeNs)); | ||
log.info("Successfully triggered index encryption with commit in {} ms", elapsedTimeMs(startTimeNs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks confusing / error-prone. Seeing elapsedTimeMs take a parameter with "Ns" suffix smells of a bug. I wonder if we should embrace Instant where we can and thus avoid time units in many places?
// Use the update-only http client, considering encryption is an update as we indeed create new Lucene segments. | ||
Http2SolrClient solrClient = req.getCoreContainer().getUpdateShardHandler().getUpdateOnlyHttpClient(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems you could move this to inside sendEncryptionRequestWithRetry, thus one fewer param as well. It doesn't matter but sendEncryptionRequestWithRetry has lots of params so maybe this move would be nicer. Up to you.
The client can call EncryptionRequestHandler and provide the additional parameter "distrib" = "true". In this case, this handler will distribute the encryption request to all the leader replicas of all the shards of the collection, ensuring they all encrypt their index shard. The response "encryptionState" will be "complete" only when all the shards return "complete". So the caller may repeatedly send the same encryption request with "distrib" = "true" to know when the whole collection encryption is complete. In addition, the caller can set the "timeAllowed" parameter to define a timeout for the request distribution, in milliseconds. This timeout is for the distribution itself, not the encryption process.