kvs-watch: only fetch new data for appends #6444

chu11 · 2024-11-15T20:29:22Z

Problem: The FLUX_KVS_WATCH_APPEND flag is implemented inefficiently in kvs-watch. Everytime new data is appended, the entire contents of the watched key are retrieved and only new data calculated via an offset is sent to the watcher. This significantly slows down performance of large appended data (such as stdio).

Solution: Instead of retrieving the entire contents of the watched key, fetch the tree object for the key from the KVS. With access to the tree object's blobref array, we need only access the new appended blobrefs from the content store. This significantly cuts down on the amount of data transfered during a kvs-watch.

Fixes #6414

side notes:

so protocol wise
- before it was
  - first RPC get all KVS data, return it all, note how long the data is
  - later RPCs, get all KVS data, only send new data past offset of prior request
- now it is
  - RPCs now get KVS treeobj
    - keep track of how many references are in the array
    - get each blobref data from the content store
    - return that data to the user

So RPC wise the pros are:

no longer iterating / building larger RPCs than is necessary

the cons are

adding a level of RPC indirection (getting treeobj)
possibly sending more RPCs back to caller (first RPC does not "combine" data into a single return)

For large data, this is a significant performance improvement for use cases like this:

time flux run flux lptest 2000000 80

wallclock time is generally being cut down from 58 seconds to 24 seconds. It's a big win. (FWIW, this was on corona. On my laptop it was 15s before 11.7s afterwards, not as much as a win there, but still that's 22%)

For everything else (flux job attach JOBID for a finished job, job throughput), it appears that performance is a wash. That the wins balance out the losses. This is of course with limited testing in single node instances.

Edit: Also as an aside, there is more potential for support of #6292 with this implementation

garlick

This is really a nice improvement IMHO!

Before I go further into a review, could you fix the following

Don't break short function parameter lists to multiple lines
Don't log errors for user-induced problems. As a general rule, if sending an error response to the user, don't log it also
Make an effort to make log messages more helpful that "function name: strerror".

garlick · 2024-11-17T14:02:24Z

src/modules/kvs-watch/kvs-watch.c

+     * Ordering note: KVS lookups can be returned out of order due to
+     * what is presently cached.  KVS lookup futures are added to the


suggestion: instead of

due to what is presently cached

how about

because they are processed asynchronously

lemme tweak. Even though it's asynchronous, was trying to explain scenario KVS could return things out of order.

garlick · 2024-11-17T14:12:40Z

doc/man3/flux_kvs_lookup.rst

+   will only contain the additional appended data.  This flag will
+   only work with appends with the FLUX_KVS_APPEND.  Overwritten data
+   will result in an error.


Since the only way to append is with FLUX_KVS_APPEND, I think the first sentence added here could be dropped.

The second one might be slightly clearer if it read

If the value is overwritten, the lookup fails with EINVAL

(or whatever errno it fails wiith)

Aside: if we've detected that a watched key has been overwritten and we're watching it with this flag, a human readable error response like "value of key watched with WATCH_APPEND flag was overwritten" might be helpful.

garlick · 2024-11-17T14:17:25Z

src/modules/kvs-watch/kvs-watch.c

+            w->prev_end_index = treeobj_get_count (val) - 1;
+        }
+        else {
+            flux_log (h, LOG_ERR, "%s: treeobj not val or valref", __FUNCTION__);


Drop log message since error is being returned to the user and this is an expected situation as far as this service goes.

garlick · 2024-11-17T14:19:05Z

src/modules/kvs-watch/kvs-watch.c

+static void handle_load_response (flux_future_t *f,
+                                  struct watcher *w)
+{


Style: don't break short parameter lists to multiple lines

chu11 · 2024-11-18T22:50:53Z

re-pushed fixing up a lot of the minor things listed above. I went with the general rule of making more descriptive errors when an error would be half-likely, but kept "function name: strerror" log messages for things that should extremely unlikely.

garlick

Not going to get through this before meeting so here's what I have so far!

garlick · 2024-11-20T21:49:50Z

src/modules/kvs-watch/kvs-watch.c

@@ -21,6 +21,7 @@
 #include "src/common/libkvs/treeobj.h"


commit message: s/everytime/every time/

garlick · 2024-11-20T21:53:14Z

src/modules/kvs-watch/kvs-watch.c

+        if (flux_respond_pack (h, w->request, "{ s:o }", "val", val) < 0) {
+            flux_log_error (h, "%s: flux_respond_pack", __FUNCTION__);
+            json_decref (val);
+            goto error;
+        }


After the goto error, another error response is attempted. I think we've settled previously that this is not a good idea. Just log the (unlikely) error.

In new code I've been trying to log messages like "failed to respond to TOPIC" in these cases.

garlick · 2024-11-20T21:54:53Z

src/modules/kvs-watch/kvs-watch.c

+    saved_errno = errno;
+    flux_future_destroy (f);
+    errno = saved_errno;


flux_future_destroy already preserves errno

garlick · 2024-11-20T21:55:56Z

src/modules/kvs-watch/kvs-watch.c

+    if (!(f = content_load_byblobref (h, ref, 0))) {
+        flux_log_error (h, "%s: content_load_byblobref", __FUNCTION__);
+        goto error;
+    }
+    if (flux_future_then (f, -1., load_continuation, w) < 0) {
+        flux_log_error (h, "%s: flux_future_then", __FUNCTION__);
+        goto error;
+    }


Suggestion: combine error logs and make it more readable, e.g.

if (!(f = load()) || then() < 0) { flux_log_error ("failed to send TOPIC request") }

chu11 · 2024-11-21T23:24:31Z

re-pushed with fixes per comments above. Also added a few new cleanups. I could probably split those out at this point in time.

garlick

This seems good! I made a bunch of comments about error messages and some about readability, but I'll tentatively approve and leave up to you how you want to proceed. It's probably best to get this merged early in our release cycle so we have plenty of time to notice any issues.

Great work!

garlick · 2024-12-04T15:29:21Z

src/modules/kvs-watch/kvs-watch.c

+    if (content_load_get (f, &data, &size) < 0) {
+        flux_log_error (h, "failed to load content data");
+        goto error_respond;
+    }
+
+    if (!w->mute) {
+        json_t *val = treeobj_create_val (data, size);
+        if (!val) {
+            flux_log_error (h, "%s: treeobj_create_val", __FUNCTION__);
+            goto error_respond;
+        }
+        if (flux_respond_pack (h, w->request, "{ s:o }", "val", val) < 0) {
+            flux_log_error (h,
+                            "%s: failed to respond to kvs-watch.lookup",
+                            __FUNCTION__);
+            json_decref (val);
+            goto finished;
+        }
+        w->responded = true;
+    }
+
+    return;
+error_respond:
+    if (!w->mute) {
+        if (flux_respond_error (h, w->request, errno, NULL) < 0)
+            flux_log_error (h, "%s: flux_respond_error", __FUNCTION__);
+    }


In the two cases here where an error is being returned to the user, instead of logging detailed errors, maybe it would be better to return the detailed errors to the user?

garlick · 2024-12-04T15:30:57Z

src/modules/kvs-watch/kvs-watch.c

+static flux_future_t *load_ref (flux_t *h,
+                                struct watcher *w,
+                                const char *ref)
+{


Style: don't break short parameter list

garlick · 2024-12-04T15:33:45Z

src/modules/kvs-watch/kvs-watch.c

+
+    if (!(f = content_load_byblobref (h, ref, 0))
+        || flux_future_then (f, -1., load_continuation, w) < 0) {
+        flux_log_error (h, "failed to send content.load request");


An error is being returned to the user so no need to log this.

garlick · 2024-12-04T15:34:27Z

src/modules/kvs-watch/kvs-watch.c

+        flux_future_t *f;
+        const char *ref = treeobj_get_blobref (val, i);
+        if (!ref) {
+            flux_log_error (h, "%s: treeobj_get_blobref", __FUNCTION__);


Error is being returned to the user, don't log.

garlick · 2024-12-04T15:37:21Z

src/modules/kvs-watch/kvs-watch.c

+        if (load_range (h,
+                        w,
+                        w->prev_start_index,
+                        w->prev_end_index,
+                        val) < 0)
+            goto error_respond;


Set errmsg to something here in place of the logging above. Maybe "error sending request for content blobs [start:finish]". (It might be helpful to use flux_error_t instead of the const char **errmsg so that errprintf can be used)

garlick · 2024-12-04T15:55:40Z

src/modules/kvs-watch/kvs-watch.c

-        if (new_offset < w->append_offset) {
-            free (new_data);
+        else {
+            (*errmsg) = "value of key watched with WATCH_APPEND overwritten";


Hmmm, I think overwritten is correct here. Something that should be a valref is now not a valref, indicating the key was overwritten with something new.

garlick · 2024-12-04T15:56:13Z

src/modules/kvs-watch/kvs-watch.c

-        if (flux_rpc_get_unpack (f, "{ s:o s:i }",
+        if (flux_rpc_get_unpack (f, "{ s:o s:s s:i }",
                                 "val", &val,
+                                 "rootref", &root_ref,
                                 "rootseq", &root_seq) < 0) {
            /* It is worth mentioning ENOTSUP error conditions here.


fix formatting since you touched it :-)

garlick · 2024-12-04T15:57:29Z

src/modules/kvs-watch/kvs-watch.c

    int root_seq;
    json_t *val;
+    const char *errmsg = NULL;


Maybe it would be best to make this a flux_error_t error and just make sure it is filled in for all possible errors that are going to be sent to the user?

ahh, there's a rebase error. Due to the changes from 0047178, the errmsg actually does nothing. Lemme update in general.

garlick · 2024-12-04T15:58:01Z

src/modules/kvs-watch/kvs-watch.c

        if (flux_msg_pack (msg, "{s:s s:s s:i}",
                           "key", w->key,
                           "namespace", ns,
-                           "flags", w->flags) < 0)
+                           "flags", flags) < 0)


fix formatting

garlick · 2024-12-04T15:59:21Z

src/modules/kvs-watch/kvs-watch.c

+            /* N.B. It may not be obvious why we have to check w->index_set if we
+             * have not yet responded.  It is possible we have received a setroot
+             * response and an updated valref before loads from the content store
+             * have returned to the caller.
+             */


These comments exceed 80 cols.

chu11 · 2024-12-04T22:04:00Z

re-pushed with fixes based on comments above

there are two new commits, one to deal with parameters on different lines and one to add a treeobj_type_name() convenience function. So probably warrants another look @garlick

Problem: small amounts of eventlog data may not be flushed to the terminal when they are received. This can introduce complication in tests. Solution: call fflush() after events are processed with eventlog_entry_dumpf().

garlick · 2024-12-05T18:40:02Z

src/common/libkvs/treeobj.c

+        return "dir";
+    else if (treeobj_is_dirref (obj))
+        return "dirref";
+    return NULL;


Suggestion: return "unknown" instead of null if the main purpose is logging.

Problem: It would be convenient to know the treeobj type name for debugging. No treeobj type name conversion function exists. Add a new treeobj_type_name() function to libkvs treeobj API.

Problem: kvs-watch dos not conform to modern project coding style per RFC7. Break long parameter lists into one per line.

Problem: The FLUX_KVS_WATCH_APPEND flag is implemented inefficiently in kvs-watch. Every time new data is appended, the entire contents of the watched key are retrieved and only new data calculated via an offset is sent to the watcher. This significantly slows down performance of large appended data (such as stdio). Solution: Instead of retrieving the entire contents of the watched key, fetch the tree object for the key from the KVS. With access to the tree object's blobref array, we need only access the new appended blobrefs from the content store. This significantly cuts down on the amount of data transfered during a kvs-watch. Fixes flux-framework#6414

Problem: The FLUX_KVS_WATCH_APPEND description is a little out of date given recent updates. Tweak the description given recent updates.

Problem: With the recent behavior change to the kvs watch append several tests no longer work. Update kvs watch append tests in t1007-kvs-lookup-watch.t. - the new kvs watch append sends each append data separately, leading to a slight change in output for one test.A - the error output in one test changed - all tests that previously "fake append" by overwriting a value all fail now in the same way. Remove all tests except for one.

Problem: Additional coverage of the new kvs watch append behavior is needed. Add new tests in t1007-kvs-lookup-watch.t.

chu11 · 2024-12-05T19:04:44Z

@garlick thanks, fixed that minor thing, will set MWP

codecov · 2024-12-05T19:31:09Z

Codecov Report

Attention: Patch coverage is 77.24551% with 38 lines in your changes missing coverage. Please review.

Project coverage is 83.60%. Comparing base (72ae1f0) to head (acfad8d).
Report is 8 commits behind head on master.

Files with missing lines	Patch %	Lines
src/modules/kvs-watch/kvs-watch.c	75.94%	38 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6444      +/-   ##
==========================================
- Coverage   83.61%   83.60%   -0.01%     
==========================================
  Files         524      524              
  Lines       87493    87611     +118     
==========================================
+ Hits        73154    73248      +94     
- Misses      14339    14363      +24

Files with missing lines	Coverage Δ
src/cmd/job/eventlog.c	`89.41% <100.00%> (+0.11%)`	⬆️
src/common/libkvs/treeobj.c	`84.40% <100.00%> (+0.59%)`	⬆️
src/modules/kvs-watch/kvs-watch.c	`78.27% <75.94%> (+0.12%)`	⬆️

... and 8 files with indirect coverage changes

garlick reviewed Nov 17, 2024

View reviewed changes

chu11 force-pushed the issue6414_kvs_watch_optimization branch from f214e6c to c07a22d Compare November 18, 2024 22:47

garlick reviewed Nov 20, 2024

View reviewed changes

chu11 mentioned this pull request Nov 21, 2024

kvs-watch: retrieve content blobs in chunks rather than all at once #6456

Closed

chu11 force-pushed the issue6414_kvs_watch_optimization branch from c07a22d to b23189b Compare November 21, 2024 23:24

chu11 force-pushed the issue6414_kvs_watch_optimization branch from b23189b to 4029afa Compare November 21, 2024 23:25

chu11 mentioned this pull request Nov 21, 2024

kvs-watch: misc cleanup #6458

Merged

chu11 force-pushed the issue6414_kvs_watch_optimization branch 2 times, most recently from 2f8aec6 to 91c30de Compare November 22, 2024 20:19

chu11 force-pushed the issue6414_kvs_watch_optimization branch from 91c30de to 76ee0c0 Compare December 2, 2024 20:19

garlick approved these changes Dec 4, 2024

View reviewed changes

chu11 force-pushed the issue6414_kvs_watch_optimization branch 2 times, most recently from 9b34723 to ca56ea7 Compare December 4, 2024 22:03

flux-job: fflush eventlog entries

704fc9e

Problem: small amounts of eventlog data may not be flushed to the terminal when they are received. This can introduce complication in tests. Solution: call fflush() after events are processed with eventlog_entry_dumpf().

chu11 force-pushed the issue6414_kvs_watch_optimization branch from ca56ea7 to 8a21e50 Compare December 5, 2024 18:25

garlick approved these changes Dec 5, 2024

View reviewed changes

chu11 added 6 commits December 5, 2024 11:04

libkvs: support treeobj_type_name function

4519ea9

Problem: It would be convenient to know the treeobj type name for debugging. No treeobj type name conversion function exists. Add a new treeobj_type_name() function to libkvs treeobj API.

kvs-watch: break long parameter lists

111e397

Problem: kvs-watch dos not conform to modern project coding style per RFC7. Break long parameter lists into one per line.

doc: update FLUX_KVS_WATCH_APPEND description

7f037f7

Problem: The FLUX_KVS_WATCH_APPEND description is a little out of date given recent updates. Tweak the description given recent updates.

t: add new kvs watch append tests

acfad8d

Problem: Additional coverage of the new kvs watch append behavior is needed. Add new tests in t1007-kvs-lookup-watch.t.

chu11 force-pushed the issue6414_kvs_watch_optimization branch from 8a21e50 to acfad8d Compare December 5, 2024 19:04

chu11 added the merge-when-passing label Dec 5, 2024

mergify bot merged commit 0cfb291 into flux-framework:master Dec 5, 2024
33 of 34 checks passed

chu11 deleted the issue6414_kvs_watch_optimization branch December 5, 2024 20:02

chu11 mentioned this pull request Dec 16, 2024

job-info: stream appended data, do not return all data in 1 giant blob #6516

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvs-watch: only fetch new data for appends #6444

kvs-watch: only fetch new data for appends #6444

chu11 commented Nov 15, 2024 •

edited

Loading

garlick left a comment •

edited

Loading

garlick Nov 17, 2024

chu11 Nov 18, 2024

garlick Nov 17, 2024

garlick Nov 17, 2024

garlick Nov 17, 2024

garlick Nov 17, 2024

chu11 commented Nov 18, 2024

garlick left a comment

garlick Nov 20, 2024

garlick Nov 20, 2024

garlick Nov 20, 2024

garlick Nov 20, 2024

chu11 commented Nov 21, 2024

garlick left a comment

garlick Dec 4, 2024

garlick Dec 4, 2024

garlick Dec 4, 2024

garlick Dec 4, 2024

garlick Dec 4, 2024

garlick Dec 4, 2024

chu11 Dec 4, 2024

garlick Dec 4, 2024

garlick Dec 4, 2024

chu11 Dec 4, 2024

garlick Dec 4, 2024

garlick Dec 4, 2024

chu11 commented Dec 4, 2024

garlick Dec 5, 2024

chu11 commented Dec 5, 2024

codecov bot commented Dec 5, 2024

		* Ordering note: KVS lookups can be returned out of order due to
		* what is presently cached. KVS lookup futures are added to the

kvs-watch: only fetch new data for appends #6444

kvs-watch: only fetch new data for appends #6444

Conversation

chu11 commented Nov 15, 2024 • edited Loading

garlick left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chu11 commented Nov 18, 2024

garlick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chu11 commented Nov 21, 2024

garlick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chu11 commented Dec 4, 2024

Choose a reason for hiding this comment

chu11 commented Dec 5, 2024

codecov bot commented Dec 5, 2024

Codecov Report

chu11 commented Nov 15, 2024 •

edited

Loading

garlick left a comment •

edited

Loading