Skip to content

Conversation

@vrutkovs
Copy link
Contributor

@vrutkovs vrutkovs commented Jun 10, 2025

Move testcase name out of auto-regenerate-after-offline-expiry, add
refresh-period.

Follow-up for openshift/origin#29327
Tested in openshift/cluster-kube-apiserver-operator#1768 and openshift/cluster-authentication-operator#742

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jun 10, 2025
@openshift-ci-robot
Copy link

@vrutkovs: This pull request references Jira Issue OCPBUGS-57049, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @wangke19

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Move testcase name out of auto-regenerate-after-offline-expiry, add
refresh-period.

Follow-up for openshift/origin#29327

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from deads2k, jsafrane and wangke19 June 10, 2025 06:31
@vrutkovs
Copy link
Contributor Author

/hold

This makes

certificates.openshift.io/refresh-period: 70080h0m0s

instead of

certificates.openshift.io/refresh-period: 8y

Also 12h0m0s should be trimmed to 12h

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 10, 2025
@vrutkovs vrutkovs force-pushed the cert-annotation-update-jun10 branch from 94c3a48 to 5337ee8 Compare June 17, 2025 16:25
@vrutkovs
Copy link
Contributor Author

/hold cancel

Tested in openshift/cluster-kube-apiserver-operator#1768

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 18, 2025
@vrutkovs vrutkovs force-pushed the cert-annotation-update-jun10 branch 2 times, most recently from d3b66d0 to b8e4d7e Compare July 9, 2025 09:54
@jsafrane
Copy link
Contributor

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 11, 2025
@sjenning
Copy link

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 11, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 11, 2025
@vrutkovs vrutkovs force-pushed the cert-annotation-update-jun10 branch from b8e4d7e to 3f5ff62 Compare July 11, 2025 15:45
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 11, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 11, 2025
@vrutkovs vrutkovs force-pushed the cert-annotation-update-jun10 branch from 3f5ff62 to 56a5a40 Compare July 14, 2025 06:24

annotations.NotBefore = certKeyPair.Certs[0].NotBefore.Format(time.RFC3339)
annotations.NotAfter = certKeyPair.Certs[0].NotAfter.Format(time.RFC3339)
annotations.RefreshPeriod = durationRound(refresh)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it okay to store an inaccurate duration? Won’t this value be used for debugging purposes?

@vrutkovs vrutkovs force-pushed the cert-annotation-update-jun10 branch 2 times, most recently from 2537111 to 78719bd Compare July 23, 2025 07:47
// NotAfter contains certificate the certificate validity date in RFC3339 format.
NotAfter string
// RefreshPeriod contains the interval at which the certificate should be refreshed.
RefreshPeriod string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what format will the date be stored? I think this might be useful for tools that process the raw TLS info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its using time.Duration format: 8h or 30d or 10y

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the duration will be changed by the durationRound function which formats a duration into a human-readable string so machines won't be able to process it (if needed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where will this field be used ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its informational only. We don't limit or verify this that it indeed happens at specified time (however we do ensure that this certificate is being rotated at all with ShortCertRotation featuregate)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if all dates were in the same format for consistency (NotBefore, NotAfter).

This way, the processing tools (the TLS script or cluster-debug-tools) would know what to expect and could reformat them into a more human-readable form if needed during output.

How hard would it be to move durationToString to the TLS registry script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a date, it's a duration. Its not the time when next refresh occurs, it would be notAfter+refreshTime.

We can store them as milliseconds, but it won't be human-friendly - and if there is a need for a tool which reads this annotation we might as well convert "5s" into milliseconds too. The format is no longer lossy anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just store duration.String(), at least it would be accepted by https://pkg.go.dev/time#ParseDuration

it would be then possible to convert to a human-readable form on the receiving end, before displaying to end users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whereas 8y is not accepted by the time.PraseDuration - xref: https://go.dev/play/p/6qztJoDUlJJ

Copy link
Contributor Author

@vrutkovs vrutkovs Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, that would store 72h00m00s instead of 3d in annotations, this is why we need a custom Durations.ToBetterString()

// NotAfter contains certificate the certificate validity date in RFC3339 format.
NotAfter string
// RefreshPeriod contains the interval at which the certificate should be refreshed.
RefreshPeriod string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also atm we store an inaccurate value in that field. Is that OK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openshift/cluster-kube-apiserver-operator#1768 would set it for cluster-kube-apiserver-operator (see TLS artifacts in openshift/cluster-kube-apiserver-operator#1768):

            {
              "key": "certificates.openshift.io/refresh-period",
              "value": "8y"
            },

It also shows 6h refresh in 4.20 now, as we're using a faster refresh rotation in dev branch

@vrutkovs
Copy link
Contributor Author

See follow-up in #1981

@p0lyn0mial
Copy link
Contributor

See follow-up in #1981

LGTM

let's rebase this pr and merge it, and then let's merge the follow-up pr

@vrutkovs vrutkovs force-pushed the cert-annotation-update-jun10 branch from 208caca to e6d67c1 Compare July 24, 2025 07:30
@vrutkovs
Copy link
Contributor Author

lets

/hold

it for now, as it needs to be tested in CKAO PR first

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 24, 2025
}

// setTargetCertKeyPairSecret creates a new cert/key pair and sets them in the secret. Only one of client, serving, or signer rotation may be specified.
// setTargetCertKeyPairSecretAndTLSAnnotations creates a new cert/key pair and sets them in the secret. Only one of client, serving, or signer rotation may be specified.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's open a new pr for refactoring the target. i can do that. just let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, please do. I'll focus on e2e tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptal #1982

@vrutkovs vrutkovs force-pushed the cert-annotation-update-jun10 branch from e6d67c1 to 8b6b76a Compare July 24, 2025 08:03
Move testcase name out of auto-regenerate-after-offline-expiry, add
refresh-period
@vrutkovs vrutkovs force-pushed the cert-annotation-update-jun10 branch from 8b6b76a to cc2229c Compare July 24, 2025 10:25
}
c.EventRecorder.Eventf("SignerUpdateRequired", "%q in %q requires a new signing cert/key pair: %v", c.Name, c.Namespace, reason)
c.AdditionalAnnotations.RefreshPeriod = c.Refresh.String()
if err = setSigningCertKeyPairSecretAndTLSAnnotations(signingCertKeyPairSecret, c.Validity, c.AdditionalAnnotations); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's pass c.Refresh to setSigningCertKeyPairSecretAndTLSAnnotations and then to setTLSAnnotationsOnSigningCertKeyPairSecret so that annotation assignments are in one place for the signer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, see #1971 (comment)

if reason := c.CertCreator.NeedNewTargetCertKeyPair(targetCertKeyPairSecret, signingCertKeyPair, caBundleCerts, c.Refresh, c.RefreshOnlyWhenExpired, creationRequired); len(reason) > 0 {
c.EventRecorder.Eventf("TargetUpdateRequired", "%q in %q requires a new target cert/key pair: %v", c.Name, c.Namespace, reason)
c.AdditionalAnnotations.RefreshPeriod = c.Refresh.String()
if err = setTargetCertKeyPairSecretAndTLSAnnotations(targetCertKeyPairSecret, c.Validity, signingCertKeyPair, c.CertCreator, c.AdditionalAnnotations); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would also pass c.Refresh to setTargetCertKeyPairSecretAndTLSAnnotations and then to setTLSAnnotationsOnTargetCertKeyPairSecret so that annotation assignments are in one place for the target.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, see #1971 (comment)

if len(a.RefreshPeriod) > 0 && meta.Annotations[CertificateRefreshPeriodAnnotation] != a.RefreshPeriod {
diff := cmp.Diff(meta.Annotations[CertificateRefreshPeriodAnnotation], a.RefreshPeriod)
klog.V(2).Infof("Updating %q annotation for %s/%s, diff: %s", CertificateRefreshPeriodAnnotation, meta.Name, meta.Namespace, diff)
meta.Annotations[CertificateRefreshPeriodAnnotation] = a.RefreshPeriod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a unit test that checks assignment to the new annotation ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we do - some certs are not being refreshed and if it's incorrectly set by the controller we'll see this in TLS registry violations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

reason = "secret doesn't exist"
}
c.EventRecorder.Eventf("SignerUpdateRequired", "%q in %q requires a new signing cert/key pair: %v", c.Name, c.Namespace, reason)
c.AdditionalAnnotations.RefreshPeriod = c.Refresh.String()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this annotation be also set in line 83 ?

if !c.RefreshOnlyWhenExpired {
		needsMetadataUpdate := ensureOwnerRefAndTLSAnnotations(signingCertKeyPairSecret, c.Owner, c.AdditionalAnnotations)
		needsTypeChange := ensureSecretTLSTypeSet(signingCertKeyPairSecret)
		updateRequired = needsMetadataUpdate || needsTypeChange
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, I'd like to set refresh period only when contents are being refreshed. Later on we'll need a new PR which backfills this on existing certificates

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason we don’t want to set the refresh annotation for existing certificates? Wouldn’t it be easier to set it now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have at least three options:

  1. Update the callers to set the new annotation.
  2. Update the annotation at the beginning of the EnsureSigningCertKeyPair method.
  3. Modify the controller that calls EnsureSigningCertKeyPair to update the annotation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think option 2 is the easiest.
Setting the annotation at the beginning of the function ensures it is applied to both new and existing secrets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a separate PR because this was not tested on existing PRs. If TRT comes and reverts because it breaks upgrades I'd rather have it already applied for new clusters at least.

I agree that 2. seems like the easiest option so far, but I didn't dig deep into this yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@vrutkovs vrutkovs force-pushed the cert-annotation-update-jun10 branch from cc2229c to 248e356 Compare July 24, 2025 10:48
@vrutkovs
Copy link
Contributor Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 24, 2025
@p0lyn0mial
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 24, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 24, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane, p0lyn0mial, sjenning, vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vrutkovs
Copy link
Contributor Author

/refresh

@openshift-merge-bot openshift-merge-bot bot merged commit 03d85c4 into openshift:master Jul 24, 2025
4 checks passed
@openshift-ci-robot
Copy link

@vrutkovs: Jira Issue OCPBUGS-57049: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-57049 has not been moved to the MODIFIED state.

Details

In response to this:

Move testcase name out of auto-regenerate-after-offline-expiry, add
refresh-period.

Follow-up for openshift/origin#29327
Tested in openshift/cluster-kube-apiserver-operator#1768 and openshift/cluster-authentication-operator#742

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants