You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have various clusters in my network, they could be SQL clusters, or Kubernetes clusters, the thing specific to these clusters is that you generally cannot just patch and reboot all nodes at once otherwise you risk getting into some indeterminate state that likely requires a manual intervention to resolve. In normal operations most/all of these clusters can however survive one (or potentially more) nodes rebooting gracefully then returning to operations without any manual intervention required
Describe the Solution You Would Like
I would like for there to be a mechanism to somehow define a given cluster within the context of patching_as_code so that it knows to only act on one node at a time somehow or can automatically splay actual running of patching/reboot within a given window to minimise the risk of multiple overlapping cycles.
Describe Alternatives You've Considered
In the case of my MySQL/Galera based clusters what i have done is put the individual nodes into seperate patching windows that run on different weekends of the month and this largely works as advertised but requires (in my opinion) far too much manual co-ordination and management and is prone to errors if not managed carefully. there is also the fact that these clusters could be running mismatched versions for longer than might be considered ideal - this is also less feasible in larger clusters within defining a large number of patching windows
The text was updated successfully, but these errors were encountered:
Use Case
I have various clusters in my network, they could be SQL clusters, or Kubernetes clusters, the thing specific to these clusters is that you generally cannot just patch and reboot all nodes at once otherwise you risk getting into some indeterminate state that likely requires a manual intervention to resolve. In normal operations most/all of these clusters can however survive one (or potentially more) nodes rebooting gracefully then returning to operations without any manual intervention required
Describe the Solution You Would Like
I would like for there to be a mechanism to somehow define a given cluster within the context of patching_as_code so that it knows to only act on one node at a time somehow or can automatically splay actual running of patching/reboot within a given window to minimise the risk of multiple overlapping cycles.
Describe Alternatives You've Considered
In the case of my MySQL/Galera based clusters what i have done is put the individual nodes into seperate patching windows that run on different weekends of the month and this largely works as advertised but requires (in my opinion) far too much manual co-ordination and management and is prone to errors if not managed carefully. there is also the fact that these clusters could be running mismatched versions for longer than might be considered ideal - this is also less feasible in larger clusters within defining a large number of patching windows
The text was updated successfully, but these errors were encountered: