resource eventlog grows without bound in memory #6610

garlick · 2025-02-06T03:18:11Z

Problem: with #6586 merged, the memory used by the resource journal can grow without bound. This may be an issue on El Cap.

Before the next release we should think about how to mitigate that on large systems.

grondo · 2025-02-09T16:53:50Z

A simple way to address this issue would be to add a configurable maximum number of entries allowed in the in-memory resource journal.

To allow resource journal consumers to fully reconstruct state from a truncated journal, a new truncate or similar event could be added to RFC 44 which captures the state at the start of a truncated journal. This event would, at a minimum, contain the online and offline idsets at the time of truncation as well as the current drain state, possibly reusing the same format as the drain key in a resource.status response.

I'm not sure the entire state in the truncate event would immediately be necessary for the current use case, so the extra information could be marked as OPTIONAL for now. If and when it does become a requirement, it seems possible that the "truncate state" in the journal could be kept up to date by apply the newly truncated events to the previous "truncate state" as they are dropped.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resource eventlog grows without bound in memory #6610

resource eventlog grows without bound in memory #6610

garlick commented Feb 6, 2025

grondo commented Feb 9, 2025

resource eventlog grows without bound in memory #6610

resource eventlog grows without bound in memory #6610

Comments

garlick commented Feb 6, 2025

grondo commented Feb 9, 2025