-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add RootShard condition #31
Conversation
The concept of revision labels was born out of the desire to automatically restart Deployments whenever a mounted Secret changes. Originally the idea came out of another product, which compared to the kcp-operator managed all Secrets completely by itself (instead of relying on external components like cert-manager). However implementing this means that a Deployment can only be created if all Secrets already exist, otherwise an error is generated. This error is a problem because it was only logged by the operator and so was invisible to the actual admins (unless they check the operator's logs). To combat this, in kcp-dev#31 I tried to move the error into a condition on the original objects (like a RootShard). For this we introduced a new condition "CertsReady" and set it according to that error. The problem with that approach is that it conflates Secrets. In additional PRs, we mount additional Secrets, like the audit-webhook-config-file. However from the somewhat generic "not all volumes are ready" we cannot actually tell whether it's because cert-manager isn't ready yet, or if the admin misconfigured a Secret. So the original condition suddenly becomes kind of misleading. We could have extended the condition by renaming it from "CertsReady" to "VolumesReady", but we still cannot with certainty say if an error is just temporary or permanent. If we were to keep the condition as "CertsReady", all controllers would have to manually get and check all the certificate Secrets. This feels like the wrong approach. The original intent was to offload complexity onto Kubernetes, but if we start to "deep-check" each dependency and object, it feels like we're fighting the Kubernetes design. We also considered not having a condition and instead writing empty values for the secret revisions into the labels. This however meant that while the PKI is being created by cert-manager (a process that takes up to a minute), the labels on the Deployment will progressively be updated, each update creating a new Pod that cannot be started (because some secrets were still missing). This felt very noisy and spammy and not the right approach either. Plus it would hide actual issues in the Deployment. Ultimately we could not think of a good, elegant way to handle this and decided that it would be the simplest if we get rid of the whole concept alltogether. This means the Deployments will be created immediately and they will create the desired number of Pods which will sit there until _Kubernetes_ decides they can progress. For other related Secrets, the controllers can actively check and create fitting conditions, like in kcp-dev#31, where a condition for the RootShardRef's status is generated explicitly. On-behalf-of: @SAP [email protected]
/hold |
On-behalf-of: @SAP [email protected]
On-behalf-of: @SAP [email protected]
On-behalf-of: @SAP [email protected]
…ist of conditions from their reconcile() functions On-behalf-of: @SAP [email protected]
b4c61c6
to
37a5378
Compare
On-behalf-of: @SAP [email protected]
26fd958
to
b874d81
Compare
/unhold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
LGTM label has been added. Git tree hash: cd0735aa3fddaa64d07664a513fad483b06a1ae7
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: embik The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary
This started out small, with the goal of only introducing a condition to signal whether the RootShardRef is valid, so that we do not have to log errors all the time. But then it ballooned into a collection of many smaller improvements. Please see the list of commits for more info :)
Release Notes