-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvs: return ENOSYS on unfinished requests #6049
kvs: return ENOSYS on unfinished requests #6049
Conversation
385be70
to
fd89e0f
Compare
src/common/libfluxutil/Makefile.am
Outdated
-I$(top_builddir)/src/common/libflux \ | ||
$(LIBUUID_CFLAGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On moving msg_hash out of librouter - librouter is already included in libflux-internal.la so is it necessary to move it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess it's not super necessary. I moved it out just b/c it is generically useful and it feels weird to include a convenience library header from librouter/msg_hash.h
instead of libutil/msg_hash.h
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prob not worth it IMHO. It's just one convenience library vs another. 🤷
fd89e0f
to
f020537
Compare
rebased & re-pushed, removing the move of msg_hash from librouter to libfluxutil. |
src/modules/kvs/kvs.c
Outdated
if (zhashx_size (ctx->requests) > 0) { | ||
/* anything that has not yet completed gets an ENOSYS */ | ||
const flux_msg_t *msg = zhashx_first (ctx->requests); | ||
while (msg) { | ||
if (flux_respond_error (ctx->h, msg, ENOSYS, NULL) < 0) | ||
flux_log_error (ctx->h, "%s: flux_respond_error", __FUNCTION__); | ||
msg = zhashx_next (ctx->requests); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: log something at LOG_ERR for each request like:failing pending <topic> request
.
Also, maybe in all the KVS tests, ensure that this message is not logged just to be sure we don't have a request_tracking_add()
without a corresponding request_tracking_remove()
somewhere, since the result of that could be quite bad - like terminating an unrelated RPC with ENOSYS and that will be hard to track down.
Could also add the size of this list to the flux module stats
output since that may be useful in debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, caught a corner case and possibly another bug unrelated to this PR
f020537
to
d3e7c64
Compare
re-pushed, fixing up things per comments above and adding some extra tests. The only real "gotcha" on the tests is that some "pending_requests" exist in one test at the end, so those are manually counted. This is related to the recently opened #6112 |
d3e7c64
to
085213b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM! Thanks!
Problem: There is inconsistent use of spaces vs tabs in t1001-kvs-internals.t. Use tabs everywhere.
Problem: In the near future the kvs ctx will be needed by callers to lookup_common(). Have callers to lookup_common() pass the kvs ctx as a parameter instead of the void * arg.
Problem: When the KVS module is unloaded, any unfinished requests (get, put, wait-version, etc.) should get an ENOSYS response. That is presently not handled. Solution: Track all requests sent to the kvs module that will not be returned immediately. Upon module exit, respond to all unfinished requests with ENOSYS. Fixes flux-framework#5979
Problem: There are no tests to ensure unfinished requests to the kvs module are returned ENOSYS when the module is unloaded. Add coverage in t1001-kvs-internals.t.
Problem: There is no way to know how many pending requests are currently being handled by the KVS. Add this statistic to the module stats.
Problem: There are no tests to ensure the new kvs "pending_requests" stat is working. Add some tests to t1001-kvs-internals.t.
Problem: Now that the KVS tracks pending requests, it would be easy to introduce an error in the future where pending requests are not tracked correctly. Ensure that pending requests are 0 (or the expected number) at the end of each kvs test file.
085213b
to
176eea4
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6049 +/- ##
==========================================
- Coverage 83.37% 83.36% -0.01%
==========================================
Files 521 521
Lines 84681 84719 +38
==========================================
+ Hits 70601 70626 +25
- Misses 14080 14093 +13
|
Problem: When the KVS module is unloaded, any unfinished requests (get, put, wait-version, etc.) should get an ENOSYS response. That is presently not handled.
Solution: Track all requests sent to the kvs module that will not be returned immediately. Upon module exit, respond to all unfinished requests with ENOSYS.
Fixes #5979
note, I noted in #5979 that some other modules might need this fix as well. Could tack onto this PR as well, but thought we'd deal with just this one first since it specifically has been hit.