Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement elemental-register upgrade #868

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

anmazzotti
Copy link
Contributor

@anmazzotti anmazzotti commented Oct 10, 2024

Part of rancher/elemental#1565

This PR introduces the elemental-register upgrade subcommand to replace the suc-upgrade script.

Most of the logic from suc-upgrade was kept, except the isHigherVersion and FORCE=true logic, which got eliminated.
Instead, the elemental-register upgrade commands accepts a correlationID that will be applied to the Elemental snapshot during upgrade.

By default we are using the ManagedOSImage.Spec sha224 hash as correlationID. With some quirks, for example the spec.Targets is always nullified to avoid re-applying upgrades whenever the target clusters are updated.

Also all client side logic to prevent concurrent upgrades (using a file based lock) has been removed. Instead we exploit the system-upgrade-controller exclusive flag.

For example, this will be the resulting state after a successful upgrade:

apiVersion: elemental.cattle.io/v1beta1
kind: ManagedOSImage
metadata:
  name: dev-upgrade
  namespace: fleet-default
  labels:
    elemental.cattle.io/upgrade-correlation-id: 468c501b8597d483f9c5e24bad840c9473cdae750b9806c152ad0f46
spec:
  osImage: "172.18.0.2:30000/elemental-os:dev-next"
  clusterTargets:
    - clusterName: volcano
  upgradeContainer:
    envs:
      # Use UPGRADE_RECOVERY to upgrade both system and recovery partitions.
      - name: UPGRADE_RECOVERY
        value: "true"

On downstream, the snapshot will carry the CorrelationID

test-e5331e3b-1e1b-4ce7-b080-235ed9a6d07c:~ # elemental state
date: "2024-10-25T13:28:20Z"
snapshotter:
    type: btrfs
    max-snaps: 2
    config: {}
efi:
    label: COS_GRUB
oem:
    label: COS_OEM
persistent:
    label: COS_PERSISTENT
recovery:
    label: COS_RECOVERY
    recovery:
        source: dir:///run/rootfsbase
        fs: squashfs
        labels:
            correlationID: 468c501b8597d483f9c5e24bad840c9473cdae750b9806c152ad0f46
            image: 172.18.0.2:30000/elemental-os:dev-next
            managedOSImage: dev-upgrade
        date: "2024-10-25T13:28:20Z"
        fromAction: upgrade
state:
    label: COS_STATE
    snapshots:
        1:
            source: dir:///run/rootfsbase
            date: "2024-10-25T13:25:03Z"
            fromAction: install
        2:
            source: dir:///
            active: true
            labels:
                correlationID: 468c501b8597d483f9c5e24bad840c9473cdae750b9806c152ad0f46
                image: 172.18.0.2:30000/elemental-os:dev-next
                managedOSImage: dev-upgrade
            date: "2024-10-25T13:28:20Z"
            fromAction: upgrade

And the plan as well:

test-e5331e3b-1e1b-4ce7-b080-235ed9a6d07c:~ # kubectl -n cattle-system get plans os-upgrader-dev-upgrade -o yaml
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  annotations:
    meta.helm.sh/release-name: mos-dev-upgrade
    meta.helm.sh/release-namespace: default
    objectset.rio.cattle.io/id: default-mos-dev-upgrade
  creationTimestamp: "2024-10-25T13:27:34Z"
  generation: 2
  labels:
    app.kubernetes.io/managed-by: Helm
    elemental.cattle.io/upgrade-correlation-id: 468c501b8597d483f9c5e24bad840c9473cdae750b9806c152ad0f46
    objectset.rio.cattle.io/hash: 3d9ab475f418bf210ef5034b761aaa5dfc14f587
  name: os-upgrader-dev-upgrade
  namespace: cattle-system
  resourceVersion: "4272"
  uid: 5cca77bc-238f-48aa-9d5f-ff5f46d92e6c
spec:
  concurrency: 1
  cordon: true
  drain:
    deleteLocalData: true
    force: true
    ignoreDaemonSets: true
    skipWaitForDeleteTimeout: 60
  exclusive: true
  nodeSelector: {}
  secrets:
  - name: os-upgrader-dev-upgrade
    path: /run/data
  serviceAccountName: os-upgrader-dev-upgrade
  tolerations:
  - operator: Exists
  upgrade:
    command:
    - /usr/sbin/suc-upgrade
    envs:
    - name: FORCE
      value: "false"
    - name: UPGRADE_RECOVERY
      value: "true"
    - name: UPGRADE_RECOVERY_ONLY
      value: "false"
    - name: ELEMENTAL_REGISTER_UPGRADE_SNAPSHOT_LABELS
      value: managedOSImage=dev-upgrade,image=172.18.0.2:30000/elemental-os:dev-next
    - name: ELEMENTAL_REGISTER_UPGRADE_CORRELATION_ID
      value: 468c501b8597d483f9c5e24bad840c9473cdae750b9806c152ad0f46
    image: 172.18.0.2:30000/elemental-os
  version: dev-next
status:
  conditions:
  - lastUpdateTime: "2024-10-25T13:36:18Z"
    reason: PlanIsValid
    status: "True"
    type: Validated
  - lastUpdateTime: "2024-10-25T13:36:18Z"
    reason: Version
    status: "True"
    type: LatestResolved
  - lastUpdateTime: "2024-10-25T13:29:54Z"
    status: "True"
    type: Complete
  latestHash: 52588095671b923da6784fd19a810deafc37a9b6d67b395b394173bc
  latestVersion: dev-next

@anmazzotti anmazzotti requested a review from a team as a code owner October 10, 2024 14:07
@anmazzotti anmazzotti marked this pull request as draft October 10, 2024 14:07
@github-actions github-actions bot added area/operator operator related changes area/register register related changes area/build build related changes labels Oct 10, 2024
@anmazzotti anmazzotti force-pushed the elemental-register_upgrade branch 3 times, most recently from da3e58f to 9a516f7 Compare October 11, 2024 09:02
@github-actions github-actions bot removed the area/build build related changes label Oct 11, 2024
@anmazzotti anmazzotti force-pushed the elemental-register_upgrade branch 2 times, most recently from 456636a to 0585484 Compare October 11, 2024 11:09
@anmazzotti anmazzotti force-pushed the elemental-register_upgrade branch from 72240cf to 3275679 Compare October 11, 2024 13:08
@github-actions github-actions bot added area/tests test related changes area/build build related changes labels Oct 11, 2024
@anmazzotti anmazzotti force-pushed the elemental-register_upgrade branch 3 times, most recently from 4a4dd72 to 2895237 Compare October 23, 2024 09:59
@anmazzotti anmazzotti force-pushed the elemental-register_upgrade branch from 6dbaf10 to 1ff3b58 Compare October 30, 2024 10:15
@anmazzotti anmazzotti force-pushed the elemental-register_upgrade branch from 1ff3b58 to 2ca6dd8 Compare October 30, 2024 10:20
@anmazzotti
Copy link
Contributor Author

anmazzotti commented Oct 30, 2024

This is still in draft due to the heavy implications of introducing new upgrade logic.
Most notably, it's still not 100% clear to me what impact this will have with the lifecycle of existing ManagedOSImages.
The newly added logic will eventually recompute and update the downstream upgrade plan, to add labels, environment variables, and the newly used exclusive flag.
So it is possible that upon upgrading to a newer version of the elemental-operator, all ManagedOSImage will start a mass unnecessary upgrade on all targeted clusters, which will be ineffective as the old suc-upgrade logic will do nothing when upgrading to the same image, but still quite undesirable outcome.

Summary of features:

  • elemental-register upgrade command has been introduced to run upgrades through the elemental-register client. (This will enable in the future to report on the status of an upgrade for each MachineInventory)
  • A consistent elemental.cattle.io/upgrade-correlation-id label has been added to correlate ManagedOSImages, downstream Plans, and machine snapshots.
  • It is possible to use the elemental state command to inspect the snapshots installed on a machine, and correlate each snapshot to a particular correlationID, image URI, ManagedOSImage resource, and ManagedOSVersion
  • Downstream system-upgrade-controller plans now use the exclusive option, to avoid running concurrent upgrades on the same node

Changed behavior:

  • Image version detection through /etc/os-release comparison has been dropped. The elemental-operator upgrade process will no longer try to determine if an upgrade image is equal, higher, or lower version than the one running on the machine. Instead a deterministic ManagedOSImage.Spec hash is computed. The hash ignores the target clusters, so any other update to the ManagedOSImage.Spec (for example updating the referenced ManagedOSVersion) will compute a new hash and trigger a new upgrade.
  • The FORCE upgrade mechanism has been dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build build related changes area/operator operator related changes area/register register related changes area/tests test related changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant