-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: truncate resource journal at a configurable size #6631
Conversation
Problem: It would be useful to create a struct drainset from a JSON serialized version of the structure. Add drainset_from_json() which can decode the result of drainset_to_json().
Problem: There are no unit tests for the drainset_from_json() function. Add them to the drainset unit test.
Problem: The drainset class cannot be used to capture state for drain events since it does not support the overwrite options nor undrain. Add drainset_drain_ex(), which takes an overwrite parameter and handles it as described in RFC 44. Add drainset_undrain(), which undrains a single rank.
Problem: The drainset unit tests do not exercise drainset_drain_ex() and drainset_undrain(). Update unit tests.
Problem: RFC 44 stipulates that a truncate event shall appear as the first event in the resource eventlog when it has been truncated, and that the context of this event shall contain a summary of truncated information, but the resource eventlog has no support for this event. Add a truncate class to the resource module which can accumulate required information from dropped events, providing the base functionality for a truncate event.
Problem: There are no unit tests for the resource module truncate class. Add a set of simple unit tests to test basic functionality.
Problem: The resource journal will need to truncated by dropping entries from the start, but a jansson json array is not conducive to this operation. Swtich to a zlistx_t for the reosurce journal. This data structure allows the removal and addition of items to the front of the list (without moving the memory for the rest of the items on the list)
Problem: The in-memory resource journal grows without bound and there is no way to limit its size. In preparation for truncating the resource journal, add a new config key in the [resource] table, `journal_max`, which can be used to set a limit on the journal size. Default this limit to 100000 entries. Modify the reslog constructor to accept this journal_max limit, but don't do anything with it just yet.
Problem: The resource journal limit cannot be configured at runtime because configuration updates are ignored by the resource module. Capture any new resource.journal_max entry on config reload and send it to the reslog implementation.
Problem: The resource journal can grow without bound. Truncate the in-memory resource eventlog if configured and replace the most recent event with an RFC 44 truncate event.
Problem: There's no testing of resource journal truncation in the testsuite. Add a set of very basic tests to t2355-resource-journal.t that ensure basic eventlog truncation works.
|
||
#include "truncate.h" | ||
#include "drainset.h" | ||
//#include "inventory.h" |
Check notice
Code scanning / CodeQL
Commented-out code Note
json_t *context) | ||
{ | ||
/* Add method and R to context */ | ||
// TODO: json_t *R = json_incref (inventory_get (reslog->ctx->inventory)); |
Check notice
Code scanning / CodeQL
Commented-out code Note
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6631 +/- ##
==========================================
- Coverage 83.86% 83.85% -0.02%
==========================================
Files 534 535 +1
Lines 88358 88608 +250
==========================================
+ Hits 74100 74298 +198
- Misses 14258 14310 +52
|
This PR adds support for truncation of the in-memory resource eventlog, including addition of the
truncate
event proposed for RFC 44 which holds a summary of information lost by the truncation.The current limit here is an arbitrarily chosen 100,000 entries. The implementation actually allows the eventlog to grow to
journal-max
+ 1% before truncating, at which point enough entries are dropped to bring the size back to the limit. This is to avoid truncating on every event appended to the log once it has reached the limit.The journal limit can be altered via the
resource.journal-max
configuration key. A live update immediately truncates the event if it exceeds the max + 1% water mark.This is a WIP because maintaining the truncated "state" ended a bit more complex than I originally thought, though it is perhaps 95% complete here. However, if there's no actual need for a resource journal consumer to reconstruct state at this point, it might be worth dropping the
truncate
event context which would make this PR trivial.Items that are still TODO if we keep the current approach:
truncate_info
object to allow R to be pushed into the context on aresource-define
event.resource-update
eventIf we decide to drop the truncate event context which summarizes lost state, most of the above TODO items could be dropped. The truncate event would just notify the consumer that the log has been truncated and record the timestamp of the most recently dropped event.
Maybe I'll work up a PR that does this 2nd part only, then make all the context keys in the RFC 44 proposal set to OPTIONAL. We could always tack the extra truncate event context handling on in another PR (I should have thought of this before, sorry!)
I'll still leave this here for consideration.
An example: