Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource eventlog grows without bound in memory #6610

Open
garlick opened this issue Feb 6, 2025 · 1 comment
Open

resource eventlog grows without bound in memory #6610

garlick opened this issue Feb 6, 2025 · 1 comment

Comments

@garlick
Copy link
Member

garlick commented Feb 6, 2025

Problem: with #6586 merged, the memory used by the resource journal can grow without bound. This may be an issue on El Cap.

Before the next release we should think about how to mitigate that on large systems.

@grondo
Copy link
Contributor

grondo commented Feb 9, 2025

A simple way to address this issue would be to add a configurable maximum number of entries allowed in the in-memory resource journal.

To allow resource journal consumers to fully reconstruct state from a truncated journal, a new truncate or similar event could be added to RFC 44 which captures the state at the start of a truncated journal. This event would, at a minimum, contain the online and offline idsets at the time of truncation as well as the current drain state, possibly reusing the same format as the drain key in a resource.status response.

I'm not sure the entire state in the truncate event would immediately be necessary for the current use case, so the extra information could be marked as OPTIONAL for now. If and when it does become a requirement, it seems possible that the "truncate state" in the journal could be kept up to date by apply the newly truncated events to the previous "truncate state" as they are dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants