title | authors | reviewers | creation-date | last-updated | status | |||
---|---|---|---|---|---|---|---|---|
Proposal Template |
|
|
2021-03-31 |
2021-03-31 |
implementable |
Provide a safety policy to protect Kubernetes resources from the cascading deletion mechanism.
A table of contents is helpful for quickly jumping to sections of a proposal and for highlighting any additional information provided beyond the standard proposal template. Tools for generating a table of contents from markdown are available.
Currently, there are so many risks can be caused by cascading deletion mechanism:
- delete a CRD mistakenly, all CR disappeared
- delete a Workload mistakenly, all Pods belongs to it deleted
- delete all Workloads mistakenly in batches, all applications in the cluster unavailable
- delete a namespace, all resources in it deleted
Kruise should provide a safety policy which could help users protect Kubernetes resources and applications' availability from the cascading deletion mechanism.
API definition:
- a new feature-gate named
ResourcesDeletionProtection
- a label named
policy.kruise.io/delete-protection
, the values can be:Always
: this object will always be forbidden to be deleted, unless the label is removedCascading
: this object will be forbidden to be deleted, if it has active resources owned
If the feature-gate has enabled, the resources below with this label will be validated for deletion operation:
Kind | Group | Version | Cascading judgement |
---|---|---|---|
Namespace |
core | v1 | whether there is active Pods in this namespace |
CustomResourceDefinition |
apiextensions.k8s.io | v1beta1, v1 | whether there is existing CRs of this CRD |
Deployment |
apps | v1 | whether the replicas is 0 |
StatefulSet |
apps | v1 | whether the replicas is 0 |
ReplicaSet |
apps | v1 | whether the replicas is 0 |
CloneSet |
apps.kruise.io | v1alpha1 | whether the replicas is 0 |
StatefulSet |
apps.kruise.io | v1alpha1, v1beta1 | whether the replicas is 0 |
UnitedDeployment |
apps.kruise.io | v1alpha1 | whether the replicas is 0 |
- delete a CRD mistakenly, all CR disappeared
- delete a Workload mistakenly, all Pods belongs to it deleted
- delete all Workloads mistakenly in batches, all applications in the cluster unavailable
- delete a namespace, all resources in it deleted
If users enable ResourcesDeletionProtection
feature-gate when install or upgrade Kruise,
Kruise will require more authorities:
- Webhook for deletion operation of namespace, crd, deployment, statefulset, replicaset and workloads in Kruise.
- By default, clusterRole for reading all resource types, because CRD validation needs to list the CRs of this CRD.
Optionally, users can use
manager.role.groupsAuthorization
helm parameter to only limit which groups can be listed by Kruise.
Intercept deletion operation of the resources with policy.kruise.io/delete-protection
:
- For a
Namespace
, list the active Pods in it - For a workload, just decide by its replicas
- For a
CRD
, list the CRs of it using unstructured client
If all kruise-manager Pods are crashed or in other abnormal states, the deletion webhook will fail for the resources above, which means these resources can not be deleted temporarily.
- 31/03/2021: Proposal submission