Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add RootShard condition #31

Merged
merged 7 commits into from
Feb 1, 2025
Merged

Conversation

xrstf
Copy link
Contributor

@xrstf xrstf commented Jan 28, 2025

Summary

This started out small, with the goal of only introducing a condition to signal whether the RootShardRef is valid, so that we do not have to log errors all the time. But then it ballooned into a collection of many smaller improvements. Please see the list of commits for more info :)

Release Notes

- Improve kubectl output for Shards and FrontProxies
- Add `RootShard` condition; do not log errors regarding broken rootShardRefs

@kcp-ci-bot kcp-ci-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has signed the DCO. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 28, 2025
xrstf added a commit to xrstf/kcp-operator that referenced this pull request Jan 29, 2025
The concept of revision labels was born out of the desire to automatically
restart Deployments whenever a mounted Secret changes. Originally the idea came
out of another product, which compared to the kcp-operator managed all Secrets
completely by itself (instead of relying on external components like
cert-manager).

However implementing this means that a Deployment can only be created if all
Secrets already exist, otherwise an error is generated. This error is a problem
because it was only logged by the operator and so was invisible to the actual
admins (unless they check the operator's logs).

To combat this, in kcp-dev#31 I tried to move the error into a condition on the original
objects (like a RootShard). For this we introduced a new condition "CertsReady"
and set it according to that error.

The problem with that approach is that it conflates Secrets. In additional PRs,
we mount additional Secrets, like the audit-webhook-config-file. However from
the somewhat generic "not all volumes are ready" we cannot actually tell whether
it's because cert-manager isn't ready yet, or if the admin misconfigured a
Secret. So the original condition suddenly becomes kind of misleading.

We could have extended the condition by renaming it from "CertsReady" to
"VolumesReady", but we still cannot with certainty say if an error is just
temporary or permanent.

If we were to keep the condition as "CertsReady", all controllers would have to
manually get and check all the certificate Secrets. This feels like the wrong
approach. The original intent was to offload complexity onto Kubernetes, but if
we start to "deep-check" each dependency and object, it feels like we're
fighting the Kubernetes design.

We also considered not having a condition and instead writing empty values for
the secret revisions into the labels. This however meant that while the PKI is
being created by cert-manager (a process that takes up to a minute), the labels
on the Deployment will progressively be updated, each update creating a new Pod
that cannot be started (because some secrets were still missing). This felt very
noisy and spammy and not the right approach either. Plus it would hide actual
issues in the Deployment.

Ultimately we could not think of a good, elegant way to handle this and decided
that it would be the simplest if we get rid of the whole concept alltogether.
This means the Deployments will be created immediately and they will create
the desired number of Pods which will sit there until _Kubernetes_ decides they
can progress.

For other related Secrets, the controllers can actively check and create fitting
conditions, like in kcp-dev#31, where a condition for the RootShardRef's status is
generated explicitly.

On-behalf-of: @SAP [email protected]
@xrstf
Copy link
Contributor Author

xrstf commented Jan 29, 2025

/hold

@kcp-ci-bot kcp-ci-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 29, 2025
@kcp-ci-bot kcp-ci-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 29, 2025
…ist of conditions from their

reconcile() functions

On-behalf-of: @SAP [email protected]
@xrstf xrstf force-pushed the gentle-reconciling branch from b4c61c6 to 37a5378 Compare January 29, 2025 16:22
@xrstf xrstf changed the title ✨ Add CertificatesReady and RootShard conditions ✨ Add CRootShard condition Jan 29, 2025
@kcp-ci-bot kcp-ci-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 29, 2025
@xrstf xrstf changed the title ✨ Add CRootShard condition ✨ Add RootShard condition Jan 29, 2025
@xrstf xrstf requested a review from embik January 30, 2025 08:57
@xrstf xrstf force-pushed the gentle-reconciling branch from 26fd958 to b874d81 Compare January 30, 2025 09:00
@xrstf
Copy link
Contributor Author

xrstf commented Jan 30, 2025

/unhold

@kcp-ci-bot kcp-ci-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 30, 2025
Copy link
Member

@embik embik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kcp-ci-bot kcp-ci-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 1, 2025
@kcp-ci-bot
Copy link
Contributor

LGTM label has been added.

Git tree hash: cd0735aa3fddaa64d07664a513fad483b06a1ae7

@kcp-ci-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: embik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kcp-ci-bot kcp-ci-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 1, 2025
@kcp-ci-bot kcp-ci-bot merged commit c578d4c into kcp-dev:main Feb 1, 2025
11 checks passed
@xrstf xrstf deleted the gentle-reconciling branch February 3, 2025 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has signed the DCO. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants