Skip to content

Commit a87307e

Browse files
FxKumbegenau
andauthored
Feat: enable owner references (#2688)
* feat(498): Add ownerReferences to managed entities * empty owner reference for cross namespace secret and more tests * update ownerReferences of existing resources * removing ownerReference requires Update API call * CR ownerReference on PVC blocks pvc retention policy of statefulset * make ownerreferences optional and disabled by default * update unit test to check len ownerReferences * update codegen * add owner references e2e test * update unit test * add block_owner_deletion field to test owner reference * fix typos and update docs once more * reflect code feedback --------- Co-authored-by: Max Begenau <[email protected]>
1 parent d5a88f5 commit a87307e

28 files changed

+534
-205
lines changed

charts/postgres-operator/crds/operatorconfigurations.yaml

+5-2
Original file line numberDiff line numberDiff line change
@@ -211,9 +211,9 @@ spec:
211211
enable_init_containers:
212212
type: boolean
213213
default: true
214-
enable_secrets_deletion:
214+
enable_owner_references:
215215
type: boolean
216-
default: true
216+
default: false
217217
enable_persistent_volume_claim_deletion:
218218
type: boolean
219219
default: true
@@ -226,6 +226,9 @@ spec:
226226
enable_readiness_probe:
227227
type: boolean
228228
default: false
229+
enable_secrets_deletion:
230+
type: boolean
231+
default: true
229232
enable_sidecars:
230233
type: boolean
231234
default: true

charts/postgres-operator/templates/clusterrole.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,7 @@ rules:
120120
- create
121121
- delete
122122
- get
123+
- patch
123124
- update
124125
# to check nodes for node readiness label
125126
- apiGroups:
@@ -196,6 +197,7 @@ rules:
196197
- get
197198
- list
198199
- patch
200+
- update
199201
# to CRUD cron jobs for logical backups
200202
- apiGroups:
201203
- batch

charts/postgres-operator/values.yaml

+4-2
Original file line numberDiff line numberDiff line change
@@ -129,8 +129,8 @@ configKubernetes:
129129
enable_finalizers: false
130130
# enables initContainers to run actions before Spilo is started
131131
enable_init_containers: true
132-
# toggles if operator should delete secrets on cluster deletion
133-
enable_secrets_deletion: true
132+
# toggles if child resources should have an owner reference to the postgresql CR
133+
enable_owner_references: false
134134
# toggles if operator should delete PVCs on cluster deletion
135135
enable_persistent_volume_claim_deletion: true
136136
# toggles pod anti affinity on the Postgres pods
@@ -139,6 +139,8 @@ configKubernetes:
139139
enable_pod_disruption_budget: true
140140
# toogles readiness probe for database pods
141141
enable_readiness_probe: false
142+
# toggles if operator should delete secrets on cluster deletion
143+
enable_secrets_deletion: true
142144
# enables sidecar containers to run alongside Spilo in the same pod
143145
enable_sidecars: true
144146

docs/administrator.md

+62-8
Original file line numberDiff line numberDiff line change
@@ -223,9 +223,9 @@ configuration:
223223

224224
Now, every cluster manifest must contain the configured annotation keys to
225225
trigger the delete process when running `kubectl delete pg`. Note, that the
226-
`Postgresql` resource would still get deleted as K8s' API server does not
227-
block it. Only the operator logs will tell, that the delete criteria wasn't
228-
met.
226+
`Postgresql` resource would still get deleted because the operator does not
227+
instruct K8s' API server to block it. Only the operator logs will tell, that
228+
the delete criteria was not met.
229229

230230
**cluster manifest**
231231

@@ -243,11 +243,65 @@ spec:
243243

244244
In case, the resource has been deleted accidentally or the annotations were
245245
simply forgotten, it's safe to recreate the cluster with `kubectl create`.
246-
Existing Postgres cluster are not replaced by the operator. But, as the
247-
original cluster still exists the status will show `CreateFailed` at first.
248-
On the next sync event it should change to `Running`. However, as it is in
249-
fact a new resource for K8s, the UID will differ which can trigger a rolling
250-
update of the pods because the UID is used as part of backup path to S3.
246+
Existing Postgres cluster are not replaced by the operator. But, when the
247+
original cluster still exists the status will be `CreateFailed` at first. On
248+
the next sync event it should change to `Running`. However, because it is in
249+
fact a new resource for K8s, the UID and therefore, the backup path to S3,
250+
will differ and trigger a rolling update of the pods.
251+
252+
## Owner References and Finalizers
253+
254+
The Postgres Operator can set [owner references](https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/) to most of a cluster's child resources to improve
255+
monitoring with GitOps tools and enable cascading deletes. There are three
256+
exceptions:
257+
258+
* Persistent Volume Claims, because they are handled by the [PV Reclaim Policy]https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy/ of the Stateful Set
259+
* The config endpoint + headless service resource because it is managed by Patroni
260+
* Cross-namespace secrets, because owner references are not allowed across namespaces by design
261+
262+
The operator would clean these resources up with its regular delete loop
263+
unless they got synced correctly. If for some reason the initial cluster sync
264+
fails, e.g. after a cluster creation or operator restart, a deletion of the
265+
cluster manifest would leave orphaned resources behind which the user has to
266+
clean up manually.
267+
268+
Another option is to enable finalizers which first ensures the deletion of all
269+
child resources before the cluster manifest gets removed. There is a trade-off
270+
though: The deletion is only performed after the next two operator SYNC cycles
271+
with the first one setting a `deletionTimestamp` and the latter reacting to it.
272+
The final removal of the custom resource will add a DELETE event to the worker
273+
queue but the child resources are already gone at this point. If you do not
274+
desire this behavior consider enabling owner references instead.
275+
276+
**postgres-operator ConfigMap**
277+
278+
```yaml
279+
apiVersion: v1
280+
kind: ConfigMap
281+
metadata:
282+
name: postgres-operator
283+
data:
284+
enable_finalizers: "false"
285+
enable_owner_references: "true"
286+
```
287+
288+
**OperatorConfiguration**
289+
290+
```yaml
291+
apiVersion: "acid.zalan.do/v1"
292+
kind: OperatorConfiguration
293+
metadata:
294+
name: postgresql-operator-configuration
295+
configuration:
296+
kubernetes:
297+
enable_finalizers: false
298+
enable_owner_references: true
299+
```
300+
301+
:warning: Please note, both options are disabled by default. When enabling owner
302+
references the operator cannot block cascading deletes, even when the [delete protection annotations](administrator.md#delete-protection-via-annotations)
303+
are in place. You would need an K8s admission controller that blocks the actual
304+
`kubectl delete` API call e.g. based on existing annotations.
251305

252306
## Role-based access control for the operator
253307

docs/reference/operator_parameters.md

+25-24
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,31 @@ Parameters to configure cluster-related Kubernetes objects created by the
263263
operator, as well as some timeouts associated with them. In a CRD-based
264264
configuration they are grouped under the `kubernetes` key.
265265

266+
* **enable_finalizers**
267+
By default, a deletion of the Postgresql resource will trigger an event
268+
that leads to a cleanup of all child resources. However, if the database
269+
cluster is in a broken state (e.g. failed initialization) and the operator
270+
cannot fully sync it, there can be leftovers. By enabling finalizers the
271+
operator will ensure all managed resources are deleted prior to the
272+
Postgresql resource. See also [admin docs](../administrator.md#owner-references-and-finalizers)
273+
for more information The default is `false`.
274+
275+
* **enable_owner_references**
276+
The operator can set owner references on its child resources (except PVCs,
277+
Patroni config service/endpoint, cross-namespace secrets) to improve cluster
278+
monitoring and enable cascading deletion. The default is `false`. Warning,
279+
enabling this option disables configured delete protection checks (see below).
280+
281+
* **delete_annotation_date_key**
282+
key name for annotation that compares manifest value with current date in the
283+
YYYY-MM-DD format. Allowed pattern: `'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'`.
284+
The default is empty which also disables this delete protection check.
285+
286+
* **delete_annotation_name_key**
287+
key name for annotation that compares manifest value with Postgres cluster name.
288+
Allowed pattern: `'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'`. The default is
289+
empty which also disables this delete protection check.
290+
266291
* **pod_service_account_name**
267292
service account used by Patroni running on individual Pods to communicate
268293
with the operator. Required even if native Kubernetes support in Patroni is
@@ -293,16 +318,6 @@ configuration they are grouped under the `kubernetes` key.
293318
of a database created by the operator. If the annotation key is also provided
294319
by the database definition, the database definition value is used.
295320

296-
* **delete_annotation_date_key**
297-
key name for annotation that compares manifest value with current date in the
298-
YYYY-MM-DD format. Allowed pattern: `'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'`.
299-
The default is empty which also disables this delete protection check.
300-
301-
* **delete_annotation_name_key**
302-
key name for annotation that compares manifest value with Postgres cluster name.
303-
Allowed pattern: `'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'`. The default is
304-
empty which also disables this delete protection check.
305-
306321
* **downscaler_annotations**
307322
An array of annotations that should be passed from Postgres CRD on to the
308323
statefulset and, if exists, to the connection pooler deployment as well.
@@ -332,20 +347,6 @@ configuration they are grouped under the `kubernetes` key.
332347
drained if the node_readiness_label is not used. If this option if set to
333348
`false` the `spilo-role=master` selector will not be added to the PDB.
334349

335-
* **enable_finalizers**
336-
By default, a deletion of the Postgresql resource will trigger an event
337-
that leads to a cleanup of all child resources. However, if the database
338-
cluster is in a broken state (e.g. failed initialization) and the operator
339-
cannot fully sync it, there can be leftovers. By enabling finalizers the
340-
operator will ensure all managed resources are deleted prior to the
341-
Postgresql resource. There is a trade-off though: The deletion is only
342-
performed after the next two SYNC cycles with the first one updating the
343-
internal spec and the latter reacting on the `deletionTimestamp` while
344-
processing the SYNC event. The final removal of the custom resource will
345-
add a DELETE event to the worker queue but the child resources are already
346-
gone at this point.
347-
The default is `false`.
348-
349350
* **persistent_volume_claim_retention_policy**
350351
The operator tries to protect volumes as much as possible. If somebody
351352
accidentally deletes the statefulset or scales in the `numberOfInstances` the

e2e/tests/test_e2e.py

+100-9
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ def setUpClass(cls):
9696
print("Failed to delete the 'standard' storage class: {0}".format(e))
9797

9898
# operator deploys pod service account there on start up
99-
# needed for test_multi_namespace_support()
99+
# needed for test_multi_namespace_support and test_owner_references
100100
cls.test_namespace = "test"
101101
try:
102102
v1_namespace = client.V1Namespace(metadata=client.V1ObjectMeta(name=cls.test_namespace))
@@ -1419,17 +1419,11 @@ def test_multi_namespace_support(self):
14191419
k8s.wait_for_pod_start("spilo-role=master", self.test_namespace)
14201420
k8s.wait_for_pod_start("spilo-role=replica", self.test_namespace)
14211421
self.assert_master_is_unique(self.test_namespace, "acid-test-cluster")
1422+
# acid-test-cluster will be deleted in test_owner_references test
14221423

14231424
except timeout_decorator.TimeoutError:
14241425
print('Operator log: {}'.format(k8s.get_operator_log()))
14251426
raise
1426-
finally:
1427-
# delete the new cluster so that the k8s_api.get_operator_state works correctly in subsequent tests
1428-
# ideally we should delete the 'test' namespace here but
1429-
# the pods inside the namespace stuck in the Terminating state making the test time out
1430-
k8s.api.custom_objects_api.delete_namespaced_custom_object(
1431-
"acid.zalan.do", "v1", self.test_namespace, "postgresqls", "acid-test-cluster")
1432-
time.sleep(5)
14331427

14341428
@timeout_decorator.timeout(TEST_TIMEOUT_SEC)
14351429
@unittest.skip("Skipping this test until fixed")
@@ -1640,6 +1634,71 @@ def test_overwrite_pooler_deployment(self):
16401634
self.eventuallyEqual(lambda: k8s.count_running_pods("connection-pooler="+pooler_name),
16411635
0, "Pooler pods not scaled down")
16421636

1637+
@timeout_decorator.timeout(TEST_TIMEOUT_SEC)
1638+
def test_owner_references(self):
1639+
'''
1640+
Enable owner references, test if resources get updated and test cascade deletion of test cluster.
1641+
'''
1642+
k8s = self.k8s
1643+
cluster_name = 'acid-test-cluster'
1644+
cluster_label = 'application=spilo,cluster-name={}'.format(cluster_name)
1645+
default_test_cluster = 'acid-minimal-cluster'
1646+
1647+
try:
1648+
# enable owner references in config
1649+
enable_owner_refs = {
1650+
"data": {
1651+
"enable_owner_references": "true"
1652+
}
1653+
}
1654+
k8s.update_config(enable_owner_refs)
1655+
self.eventuallyEqual(lambda: k8s.get_operator_state(), {"0": "idle"}, "Operator does not get in sync")
1656+
1657+
time.sleep(5) # wait for the operator to sync the cluster and update resources
1658+
1659+
# check if child resources were updated with owner references
1660+
self.assertTrue(self.check_cluster_child_resources_owner_references(cluster_name, self.test_namespace), "Owner references not set on all child resources of {}".format(cluster_name))
1661+
self.assertTrue(self.check_cluster_child_resources_owner_references(default_test_cluster), "Owner references not set on all child resources of {}".format(default_test_cluster))
1662+
1663+
# delete the new cluster to test owner references
1664+
# and also to make k8s_api.get_operator_state work better in subsequent tests
1665+
# ideally we should delete the 'test' namespace here but the pods
1666+
# inside the namespace stuck in the Terminating state making the test time out
1667+
k8s.api.custom_objects_api.delete_namespaced_custom_object(
1668+
"acid.zalan.do", "v1", self.test_namespace, "postgresqls", cluster_name)
1669+
1670+
# statefulset, pod disruption budget and secrets should be deleted via owner reference
1671+
self.eventuallyEqual(lambda: k8s.count_pods_with_label(cluster_label), 0, "Pods not deleted")
1672+
self.eventuallyEqual(lambda: k8s.count_statefulsets_with_label(cluster_label), 0, "Statefulset not deleted")
1673+
self.eventuallyEqual(lambda: k8s.count_pdbs_with_label(cluster_label), 0, "Pod disruption budget not deleted")
1674+
self.eventuallyEqual(lambda: k8s.count_secrets_with_label(cluster_label), 0, "Secrets were not deleted")
1675+
1676+
time.sleep(5) # wait for the operator to also delete the leftovers
1677+
1678+
# pvcs and Patroni config service/endpoint should not be affected by owner reference
1679+
# but deleted by the operator almost immediately
1680+
self.eventuallyEqual(lambda: k8s.count_pvcs_with_label(cluster_label), 0, "PVCs not deleted")
1681+
self.eventuallyEqual(lambda: k8s.count_services_with_label(cluster_label), 0, "Patroni config service not deleted")
1682+
self.eventuallyEqual(lambda: k8s.count_endpoints_with_label(cluster_label), 0, "Patroni config endpoint not deleted")
1683+
1684+
# disable owner references in config
1685+
disable_owner_refs = {
1686+
"data": {
1687+
"enable_owner_references": "false"
1688+
}
1689+
}
1690+
k8s.update_config(disable_owner_refs)
1691+
self.eventuallyEqual(lambda: k8s.get_operator_state(), {"0": "idle"}, "Operator does not get in sync")
1692+
1693+
time.sleep(5) # wait for the operator to remove owner references
1694+
1695+
# check if child resources were updated without Postgresql owner references
1696+
self.assertTrue(self.check_cluster_child_resources_owner_references(default_test_cluster, "default", True), "Owner references still present on some child resources of {}".format(default_test_cluster))
1697+
1698+
except timeout_decorator.TimeoutError:
1699+
print('Operator log: {}'.format(k8s.get_operator_log()))
1700+
raise
1701+
16431702
@timeout_decorator.timeout(TEST_TIMEOUT_SEC)
16441703
def test_password_rotation(self):
16451704
'''
@@ -1838,7 +1897,6 @@ def test_rolling_update_flag(self):
18381897
replica = k8s.get_cluster_replica_pod()
18391898
self.assertTrue(replica.metadata.creation_timestamp > old_creation_timestamp, "Old master pod was not recreated")
18401899

1841-
18421900
except timeout_decorator.TimeoutError:
18431901
print('Operator log: {}'.format(k8s.get_operator_log()))
18441902
raise
@@ -2412,6 +2470,39 @@ def assert_distributed_pods(self, target_nodes, cluster_labels='cluster-name=aci
24122470

24132471
return True
24142472

2473+
def check_cluster_child_resources_owner_references(self, cluster_name, cluster_namespace='default', inverse=False):
2474+
k8s = self.k8s
2475+
2476+
# check if child resources were updated with owner references
2477+
sset = k8s.api.apps_v1.read_namespaced_stateful_set(cluster_name, cluster_namespace)
2478+
self.assertTrue(self.has_postgresql_owner_reference(sset.metadata.owner_references, inverse), "statefulset owner reference check failed")
2479+
2480+
svc = k8s.api.core_v1.read_namespaced_service(cluster_name, cluster_namespace)
2481+
self.assertTrue(self.has_postgresql_owner_reference(svc.metadata.owner_references, inverse), "primary service owner reference check failed")
2482+
replica_svc = k8s.api.core_v1.read_namespaced_service(cluster_name + "-repl", cluster_namespace)
2483+
self.assertTrue(self.has_postgresql_owner_reference(replica_svc.metadata.owner_references, inverse), "replica service owner reference check failed")
2484+
2485+
ep = k8s.api.core_v1.read_namespaced_endpoints(cluster_name, cluster_namespace)
2486+
self.assertTrue(self.has_postgresql_owner_reference(ep.metadata.owner_references, inverse), "primary endpoint owner reference check failed")
2487+
replica_ep = k8s.api.core_v1.read_namespaced_endpoints(cluster_name + "-repl", cluster_namespace)
2488+
self.assertTrue(self.has_postgresql_owner_reference(replica_ep.metadata.owner_references, inverse), "replica owner reference check failed")
2489+
2490+
pdb = k8s.api.policy_v1.read_namespaced_pod_disruption_budget("postgres-{}-pdb".format(cluster_name), cluster_namespace)
2491+
self.assertTrue(self.has_postgresql_owner_reference(pdb.metadata.owner_references, inverse), "pod disruption owner reference check failed")
2492+
2493+
pg_secret = k8s.api.core_v1.read_namespaced_secret("postgres.{}.credentials.postgresql.acid.zalan.do".format(cluster_name), cluster_namespace)
2494+
self.assertTrue(self.has_postgresql_owner_reference(pg_secret.metadata.owner_references, inverse), "postgres secret owner reference check failed")
2495+
standby_secret = k8s.api.core_v1.read_namespaced_secret("standby.{}.credentials.postgresql.acid.zalan.do".format(cluster_name), cluster_namespace)
2496+
self.assertTrue(self.has_postgresql_owner_reference(standby_secret.metadata.owner_references, inverse), "standby secret owner reference check failed")
2497+
2498+
return True
2499+
2500+
def has_postgresql_owner_reference(self, owner_references, inverse):
2501+
if inverse:
2502+
return owner_references is None or owner_references[0].kind != 'postgresql'
2503+
2504+
return owner_references is not None and owner_references[0].kind == 'postgresql' and owner_references[0].controller
2505+
24152506
def list_databases(self, pod_name):
24162507
'''
24172508
Get list of databases we might want to iterate over

manifests/configmap.yaml

+2-1
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ data:
4949
enable_master_pooler_load_balancer: "false"
5050
enable_password_rotation: "false"
5151
enable_patroni_failsafe_mode: "false"
52-
enable_secrets_deletion: "true"
52+
enable_owner_references: "false"
5353
enable_persistent_volume_claim_deletion: "true"
5454
enable_pgversion_env_var: "true"
5555
# enable_pod_antiaffinity: "false"
@@ -59,6 +59,7 @@ data:
5959
enable_readiness_probe: "false"
6060
enable_replica_load_balancer: "false"
6161
enable_replica_pooler_load_balancer: "false"
62+
enable_secrets_deletion: "true"
6263
# enable_shm_volume: "true"
6364
# enable_sidecars: "true"
6465
enable_spilo_wal_path_compat: "true"

manifests/operator-service-account-rbac-openshift.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@ rules:
9494
- create
9595
- delete
9696
- get
97+
- patch
9798
- update
9899
# to check nodes for node readiness label
99100
- apiGroups:
@@ -166,6 +167,7 @@ rules:
166167
- get
167168
- list
168169
- patch
170+
- update
169171
# to CRUD cron jobs for logical backups
170172
- apiGroups:
171173
- batch

0 commit comments

Comments
 (0)