You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the checkpoint is saved to the same location, overwriting whatever was already there.
In the event something goes south w/ a final checkpoint, there is no way to recover with a prior checkpoint.
There should be a rolling history of checkpoints, in the event the last checkpoint failed.
Some complexity here b/c not only do we support the rolling history, any mechanisms that recover from checkpoint may need to "go back in history" to earlier one.
The text was updated successfully, but these errors were encountered:
One thought is to add a flux kvscheck command that is run in rc1 before the KVS module is loaded. It could walk the hash tree from the current checkpoint. If it finds any missing data, it could abort, causing the instance to abort. We could add options to fix the KVS offline. For example to roll back to a previous checkpoint, or perhaps taking other measures. See also
Currently the checkpoint is saved to the same location, overwriting whatever was already there.
In the event something goes south w/ a final checkpoint, there is no way to recover with a prior checkpoint.
There should be a rolling history of checkpoints, in the event the last checkpoint failed.
Some complexity here b/c not only do we support the rolling history, any mechanisms that recover from checkpoint may need to "go back in history" to earlier one.
The text was updated successfully, but these errors were encountered: