-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
truncate resource journal at configurable size #6633
Conversation
@@ -36,7 +36,7 @@ struct reslog { | |||
struct resource_ctx *ctx; | |||
zlist_t *pending; // list of pending futures | |||
zlist_t *watchers; | |||
json_t *eventlog; | |||
zlistx_t *eventlog; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit message typo "Swtich" and "reosurce"
@@ -217,8 +217,7 @@ int reslog_post_pack (struct reslog *reslog, | |||
va_end (ap); | |||
if (!event) | |||
return -1; | |||
if (json_array_append (reslog->eventlog, event) < 0) { | |||
json_decref (event); | |||
if (!zlistx_add_end (reslog->eventlog, json_incref (event))) { | |||
errno = ENOMEM; | |||
return -1; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if zlistx_add_end()
fails, I think you gotta call the json_decref()
b/c you json_incref()
ed above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be slightly simpler to just define an entry_duplicator()
.
src/modules/resource/reslog.c
Outdated
if (!zlistx_add_end (reslog->eventlog, json_incref (entry))) | ||
goto nomem; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotta decref entry if zlistx_add_end()
failed?
@@ -197,6 +197,43 @@ int reslog_sync (struct reslog *reslog) | |||
return 0; | |||
} | |||
|
|||
/* Truncate resource journal if needed to reslog->journal_max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit message "replace the most recent event" ... perhaps you mean "replace the oldest event"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meant to be "replace the most recently dropped event" - not the oldest event in case more than one event is dropped at a time.
test_expect_success '1st event is a truncate event' ' | ||
head -1 eventlog.trunc > eventlog.trunc.1 && | ||
jq -e ".name == \"truncate\"" eventlog.trunc.1 | ||
' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: also check number of entries == limit, and likewise with 2 tests below this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I poked this a bit and LGTM.
bc88096
to
05a34be
Compare
I think I've addressed the comments above, rebased, and forced a push. There's also a slight change here that ensures the eventlog doesn't have |
Problem: The resource journal will need to truncated by dropping entries from the start, but a jansson json array is not conducive to this operation. Switch to a zlistx_t for the resource journal. This data structure allows the removal and addition of items to the front of the list (without moving the memory for the rest of the items on the list)
Problem: The in-memory resource journal grows without bound and there is no way to limit its size. In preparation for truncating the resource journal, add a new config key in the [resource] table, `journal_max`, which can be used to set a limit on the journal size. Default this limit to 100000 entries. Modify the reslog constructor to accept this journal_max limit, but don't do anything with it just yet.
Problem: The resource journal limit cannot be configured at runtime because configuration updates are ignored by the resource module. Capture any new resource.journal_max entry on config reload and send it to the reslog implementation.
Problem: The resource journal can grow without bound. Truncate the in-memory resource eventlog if configured and replace the most recently dropped event with an RFC 44 truncate event.
Problem: There's no testing of resource journal truncation in the testsuite. Add a set of very basic tests to t2355-resource-journal.t that ensure basic eventlog truncation works.
Problem: The resource.journal-max configuration key is not documented. Add it to flux-config-resource(5).
05a34be
to
6f79323
Compare
Uh, I guess I'll set MWP here now. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6633 +/- ##
=======================================
Coverage 83.87% 83.88%
=======================================
Files 533 533
Lines 88331 88372 +41
=======================================
+ Hits 74087 74127 +40
- Misses 14244 14245 +1
|
This is the simplified version of #6631. The in-memory resource eventlog is truncated at the configured limit, but the
truncate
event does not contain any context, it just marks the fact that events have been dropped.Fixes #6610