Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-49728: Adapt MCC to use LayeredNodeState and remove LayeredPoolState #4841

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

djoshy
Copy link
Contributor

@djoshy djoshy commented Feb 10, 2025

- What I did

  • Refactored all node related functions to work with the LayeredNodeState object
  • Removed all functions and tests related to layered pools as we no longer annotate images to MachineConfigPools
  • Reworked some of the status update checks to use the LayeredNodeState struct
  • Updated a few unit tests to guard for https://issues.redhat.com/browse/OCPBUGS-43552

I have a couple of open questions, for which I'll leave comments below!

- How to verify it

  • Existing units/e2es should pass.
  • Please stress test this PR for OCL rollout/status behaviors as quite a bit of that has been refactored and I may have missed something.

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Feb 10, 2025
@openshift-ci-robot
Copy link
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-49728, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

- What I did

  • Refactored all node related functions to work with the LayeredNodeState object
  • Removed all functions and tests related to layered pools as we no longer annotate images to MachineConfigPools
  • Reworked some of the status update checks to use the LayeredNodeState struct
  • Updated a few unit tests to guard for https://issues.redhat.com/browse/OCPBUGS-43552

I have a couple of open questions, for which I'll leave comments below!

- How to verify it

  • Existing units/e2es should pass.
  • Please stress test this PR for OCL rollout/status behaviors as quite a bit of that has been refactored and I may have missed something.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Feb 10, 2025
Copy link
Contributor

openshift-ci bot commented Feb 10, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: djoshy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 10, 2025
if mosb != nil && mosc != nil {
mosbState := ctrlcommon.NewMachineOSBuildState(mosb)
// It seems like pool image annotations are no longer being used, so node specific checks were required here
if layered && mosbState.IsBuildSuccess() && mosb.Spec.MachineConfig.Name == pool.Spec.Configuration.Name && isNodeDoneAt(node, pool, layered) && lns.IsCurrentImageEqualToBuild(mosc) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this check to remove MachineOSBuild successful condition comparison, since I think this can be solely determined by the node's annotation against the MOSB and MOSC objects while layered. Happy to add it back if I'm missing something here!

return val == "" || !ok
// If the MOSC does not have an image, but the node has an older image annotation, the image is still likely
// being built.
return false
Copy link
Contributor Author

@djoshy djoshy Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to double check this thought: This function will now only be called when the node is layered. As a result, if the image annotation is missing for whatever reason, we should always return false. In the old implementation, this was used in non layered mode too, so it returned true when the annotation was missing.

// we need to ensure the node controller is triggered at all the same times
// when using this new system
// we know the mosc+mosb can trigger one another and cause a build, but if the node controller
// can't set this anno, and subsequently cannot trigger the daemon to update, we need to rework.
lns.SetDesiredStateFromMachineOSConfig(mosc, mosb)
Copy link
Contributor Author

@djoshy djoshy Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed these checks because:

  1. The getAllCandidateMachines function already returns a list of viable candidate nodes, which does this by comparing the node's annotation against the MCP/OCL objects as required.
  2. This list is then passed to updateCandidateMachines which truncates the list based on the pool's capacity.
  3. Finally, the list is passed to the above updateCandidateNode function which was essentially re-doing the checks getAllCandidateMachines performed.

The last check in step (3) seemed excessive, so I removed those checks from this function to simplify things and make it easier to read.

Copy link
Contributor

openshift-ci bot commented Feb 11, 2025

@djoshy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-upgrade-out-of-change 9715a65 link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/e2e-gcp-op-ocl 9715a65 link false /test e2e-gcp-op-ocl

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants