Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: return ENOSYS on unfinished requests #6049

Merged
merged 7 commits into from
Aug 1, 2024

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Jun 19, 2024

Problem: When the KVS module is unloaded, any unfinished requests (get, put, wait-version, etc.) should get an ENOSYS response. That is presently not handled.

Solution: Track all requests sent to the kvs module that will not be returned immediately. Upon module exit, respond to all unfinished requests with ENOSYS.

Fixes #5979


note, I noted in #5979 that some other modules might need this fix as well. Could tack onto this PR as well, but thought we'd deal with just this one first since it specifically has been hit.

@chu11 chu11 force-pushed the issue5979_kvs_enosys_unload branch from 385be70 to fd89e0f Compare June 20, 2024 04:23
Comment on lines 12 to 13
-I$(top_builddir)/src/common/libflux \
$(LIBUUID_CFLAGS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On moving msg_hash out of librouter - librouter is already included in libflux-internal.la so is it necessary to move it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess it's not super necessary. I moved it out just b/c it is generically useful and it feels weird to include a convenience library header from librouter/msg_hash.h instead of libutil/msg_hash.h.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prob not worth it IMHO. It's just one convenience library vs another. 🤷

@chu11 chu11 force-pushed the issue5979_kvs_enosys_unload branch from fd89e0f to f020537 Compare July 12, 2024 22:57
@chu11
Copy link
Member Author

chu11 commented Jul 12, 2024

rebased & re-pushed, removing the move of msg_hash from librouter to libfluxutil.

Comment on lines 3073 to 3097
if (zhashx_size (ctx->requests) > 0) {
/* anything that has not yet completed gets an ENOSYS */
const flux_msg_t *msg = zhashx_first (ctx->requests);
while (msg) {
if (flux_respond_error (ctx->h, msg, ENOSYS, NULL) < 0)
flux_log_error (ctx->h, "%s: flux_respond_error", __FUNCTION__);
msg = zhashx_next (ctx->requests);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: log something at LOG_ERR for each request like:failing pending <topic> request.

Also, maybe in all the KVS tests, ensure that this message is not logged just to be sure we don't have a request_tracking_add() without a corresponding request_tracking_remove() somewhere, since the result of that could be quite bad - like terminating an unrelated RPC with ENOSYS and that will be hard to track down.

Could also add the size of this list to the flux module stats output since that may be useful in debugging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, caught a corner case and possibly another bug unrelated to this PR

@chu11
Copy link
Member Author

chu11 commented Jul 17, 2024

re-pushed, fixing up things per comments above and adding some extra tests. The only real "gotcha" on the tests is that some "pending_requests" exist in one test at the end, so those are manually counted. This is related to the recently opened #6112

@chu11 chu11 force-pushed the issue5979_kvs_enosys_unload branch from d3e7c64 to 085213b Compare July 17, 2024 22:14
Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM! Thanks!

chu11 added 7 commits August 1, 2024 15:55
Problem: There is inconsistent use of spaces vs tabs in
t1001-kvs-internals.t.

Use tabs everywhere.
Problem: In the near future the kvs ctx will be needed by callers
to lookup_common().

Have callers to lookup_common() pass the kvs ctx as a parameter
instead of the void * arg.
Problem: When the KVS module is unloaded, any unfinished requests
(get, put, wait-version, etc.) should get an ENOSYS response.  That
is presently not handled.

Solution: Track all requests sent to the kvs module that will not be
returned immediately.  Upon module exit, respond to all unfinished
requests with ENOSYS.

Fixes flux-framework#5979
Problem: There are no tests to ensure unfinished requests to the
kvs module are returned ENOSYS when the module is unloaded.

Add coverage in t1001-kvs-internals.t.
Problem: There is no way to know how many pending requests are currently
being handled by the KVS.

Add this statistic to the module stats.
Problem: There are no tests to ensure the new kvs
"pending_requests" stat is working.

Add some tests to t1001-kvs-internals.t.
Problem: Now that the KVS tracks pending requests, it would be easy
to introduce an error in the future where pending requests are not
tracked correctly.

Ensure that pending requests are 0 (or the expected number) at the
end of each kvs test file.
@chu11 chu11 force-pushed the issue5979_kvs_enosys_unload branch from 085213b to 176eea4 Compare August 1, 2024 22:56
@mergify mergify bot merged commit 3d8207b into flux-framework:master Aug 1, 2024
31 checks passed
@chu11 chu11 deleted the issue5979_kvs_enosys_unload branch August 6, 2024 20:23
Copy link

codecov bot commented Oct 30, 2024

Codecov Report

Attention: Patch coverage is 93.75000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 83.36%. Comparing base (de5d907) to head (176eea4).
Report is 531 commits behind head on master.

Files with missing lines Patch % Lines
src/modules/kvs/kvs.c 93.75% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6049      +/-   ##
==========================================
- Coverage   83.37%   83.36%   -0.01%     
==========================================
  Files         521      521              
  Lines       84681    84719      +38     
==========================================
+ Hits        70601    70626      +25     
- Misses      14080    14093      +13     
Files with missing lines Coverage Δ
src/modules/kvs/kvs.c 71.87% <93.75%> (+0.51%) ⬆️

... and 10 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvs: uncompleted RPCs should receive ENOSYS response when module is unloaded
2 participants