You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
✨ Add support for checking Machine conditions in MachineHealthCheck (#12827)
* Add support for checking Machine conditions in MachineHealthCheck
MachineHealthCheck currently only allows checking Node conditions to
validate if a machine is healthy. However, machine conditions capture
conditions that do not exist on nodes, for example, control plane node
conditions such as EtcdPodHealthy, SchedulerPodHealthy that can indicate
if a controlplane machine has been created correctly.
Adding support for Machine conditions enables us to perform remediation
during control plane upgrades.
This PR introduces a new field as part of the MachineHealthCheckChecks:
- `UnhealthyMachineConditions`
This will mirror the behavior of `UnhealthyNodeConditions` but the
MachineHealthCheck controller will instead check the machine conditions.
This reimplements and extends earlier work originally proposed in a previous PR 12275.
Co-authored-by: Justin Miron <[email protected]>
Signed-off-by: Furkat Gofurov <[email protected]>
* Fix PR check Markdown links CI
Signed-off-by: Furkat Gofurov <[email protected]>
* Address review comments
Signed-off-by: Furkat Gofurov <[email protected]>
* Address review comments: rework node and machine checks in needsRemediation() method
If both a node condition and machine condition are unhealthy, pick one reason but
combine all the messages
Signed-off-by: Furkat Gofurov <[email protected]>
* Address Stefan comments (conversion)
Signed-off-by: Furkat Gofurov <[email protected]>
* Address review comments Fabrizio (mhc target, mhc controller code)
Refactors `needsRemediation`, specifically following changes were made:
- Move machine condition evaluation to always execute first, regardless of node state
- Ensure machine conditions are checked in ALL scenarios:
* When node is missing (t.nodeMissing)
* When node hasn't appeared yet (t.Node == nil)
* When node exists (t.Node != nil)
- Consistently merge node and machine condition messages in all failure scenarios
- Maintain backward compatibility with existing condition message formats
- Use appropriate condition reasons based on which conditions are unhealthy
Signed-off-by: Furkat Gofurov <[email protected]>
* Fix event message to reflect both machine and node condition checking
Signed-off-by: Furkat Gofurov <[email protected]>
* Simplify `needsRemediation` function further by using two sub functions: one for machineChecks and the other for nodeChecks.
Another benefit of this code struct, is that condition management is implemented only in one place.
Co-authored-by: Fabrizio Pandini
Signed-off-by: Furkat Gofurov <[email protected]>
* Add CEL validation to prevent disallowed UnhealthyMachineCondition types
Signed-off-by: Furkat Gofurov <[email protected]>
* Clarify `UnhealthyMachineConditionV1Beta1Reason` precedence over node reasons
Signed-off-by: Furkat Gofurov <[email protected]>
* Address review comments (Stefan)
Signed-off-by: Furkat Gofurov <[email protected]>
---------
Signed-off-by: Furkat Gofurov <[email protected]>
Co-authored-by: Justin Miron <[email protected]>
// +kubebuilder:validation:XValidation:rule="!(self in ['Ready','Available','HealthCheckSucceeded','OwnerRemediated','ExternallyRemediated'])",message="type must not be one of: Ready, Available, HealthCheckSucceeded, OwnerRemediated, ExternallyRemediated"
256
+
// +required
257
+
Typestring`json:"type,omitempty"`
258
+
259
+
// status of the condition, one of True, False, Unknown.
0 commit comments