This section describes how to upgrade from the |previous_release| OpenStack release series to |current_release|. It is based on the :kayobe-doc:`upstream Kayobe documentation <upgrading>` with additional considerations for using StackHPC Kayobe Configuration.
A StackHPC OpenStack upgrade is broken down into several phases.
- Prerequisites
- Preparation
- Upgrading the Seed Hypervisor
- Upgrading the Seed
- Upgrading Wazuh Manager
- Upgrading Wazuh Agents
- Upgrading the Overcloud
- Cleaning up
After preparation is complete, the remaining phases may be completed in any order, however the order specified above allows for completing as much as possible before the user-facing overcloud upgrade. It is not recommended to keep different parts of the system on different releases for extended periods due to the need to maintain and use separate local Kayobe environments.
Notable changes in the |current_release| Release
There are many changes in the OpenStack |current_release| release described in the release notes for each project. Here are some notable ones.
RabbitMQ is being upgraded to 4.0 in Epoxy. Existing transient queues must be migrated on Caracal prior to upgrading.
The stackhpc.linux
collection version has been bumped to 1.3.0. Note this
version uses systemd to activate virtual functions. This change is restricted
to the stackhpc.linux.sriov
role, which is not used by Kayobe. If a custom
playbook uses this role, you can retain existing behaviour by setting
sriov_numvfs_driver
to udev
.
The default Neutron ML2 type drivers and tenant network types now use
geneve
instead of vxlan
when OVN is enabled. This affects the
kolla_neutron_ml2_type_drivers
and
kolla_neutron_ml2_tenant_network_types
variables.
If you have customized inspector_keep_ports
, ensure it is set to one of:
all
, present
, or added
. If you are relying on the previous
behaviour you should set ironic_keep_ports to present.
The default boot firmware for Seed and Infra VMs has changed from bios
to
efi
. Set infra_vm_boot_firmware
and seed_vm_boot_firmware
to bios
to retain existing behaviour.
The prometheus-msteams
integration in Kolla Ansible has been removed, users
should switch to the native
Prometheus Teams integration.
Endpoints for the blackbox exporter are now templated in the kolla-ansible
group vars for the cloud. This means that the
prometheus_blackbox_exporter_endpoints
variable can be removed from the
environment's kolla/globals.yml
file (if applicable) and the endpoints will
fallback to the ones templated in the group vars. Additional endpoints may be
added through the prometheus_blackbox_exporter_endpoints_kayobe
variable.
For example:
prometheus_blackbox_exporter_endpoints_kayobe:
- endpoints:
- "pulp:http_2xx:{{ pulp_url }}/pulp/api/v3/status/"
enabled: "{{ seed_pulp_container_enabled | bool }}"
- None so far!
As part of the Master release we are looking to improve the security baseline of StackHPC OpenStack deployments. If any of the following have not been done, they should be completed before the upgrade begins.
- Enable Center for Internet Security (CIS) compliance
- Enable TLS on the :kayobe-doc:`public API network <configuration/reference/kolla-ansible.html#tls-encryption-of-apis>`
- Enable TLS on the internal API network
- Configure walled garden networking
- Use LVM-based host images
- Deploy Wazuh
Before starting the upgrade, ensure any appropriate prerequisites are satisfied. These will be specific to each deployment, but here are some suggestions:
- If hypervisors will be rebooted, e.g. to pick up a new kernel, or reprovisioned, ensure that there is sufficient hypervisor capacity to drain at least one node.
- If using Ironic for bare metal compute, ensure that at least one node is available for testing provisioning.
- Ensure that expected test suites are passing, e.g. Tempest.
- Resolve any Prometheus alerts.
- Check for unexpected
ERROR
orCRITICAL
messages in OpenSearch Dashboard. - Check Grafana dashboards.
- Update the deployment to use the latest |previous_release| images and configuration.
Ubuntu Jammy support has been removed from the 2025.1 release onwards. Hosts must be migrated to Ubuntu 24.04 before upgrading OpenStack services. The upgrade process is currently a work in progress. .. TODO: Add link to another page describing how to migrate
Preparation is crucial for a successful upgrade. It allows for a minimal maintenance/change window and ensures we are ready if unexpected issues arise.
The less you need to think on upgrade day, the better. Save your brain for solving any issues that arise. Write an upgrade plan detailing:
- the predicted schedule
- a checklist of prerequisites
- a set of smoke tests to perform after significant changes
- a list of steps to perform during the preparation phase
- a list of steps to perform during the upgrade maintenance/change window phase
- a list of steps to perform during the follow up phase
- a set of full system tests to perform after the upgrade is complete
- space to make notes of progress and any issues/solutions/workarounds that arise
Ideally all steps will include the exact commands to execute that can be copy/pasted, or links to appropriate CI/CD workflows to run.
Before you start, be sure to back up any local changes, configuration, and data.
See the :kayobe-doc:`Kayobe documentation <administration/overcloud.html#performing-database-backups>` for information on backing up the overcloud MariaDB database. It may be prudent to take backups at various stages of the upgrade since the database state will change over time.
If the deployment uses any source code forks (other than the StackHPC ones), update them to use the |current_release| release.
Kayobe configuration options may be changed between releases of Kayobe. Ensure that all site local configuration is migrated to the target version format. See the :skc-doc:`StackHPC Kayobe Configuration release notes <release-notes.html>`, :kayobe-renos:`Kayobe release notes <>` and :kolla-ansible-renos:`Kolla Ansible release notes <>`. In particular, the Upgrade Notes and Deprecation Notes sections provide information that might affect the configuration migration.
In the following example we assume a branch naming scheme of
example/<release>
.
Create a branch for the new release:
git fetch origin
git checkout example/|previous_release|
git checkout -b example/|current_release|
git push origin example/|current_release|
Merge in the new branch of StackHPC Kayobe Configuration:
git remote add stackhpc https://github.com/stackhpc/stackhpc-kayobe-config
git fetch stackhpc
git fetch origin
git checkout -b example/|current_release|-sync origin/example/|current_release|
git merge stackhpc/|current_release_git_branch_name|
There may be conflicts to resolve. The configuration should be manually inspected after the merge to ensure that it is correct. Once complete, push the branch and create a pull request with the changes:
git push origin example/|current_release|-sync
Once approved and merged, update the configuration to adapt to the new release.
This may involve e.g. adding, removing or renaming variables to allow for
upstream changes. Note that configuration in the base environment
(etc/kayobe/
) will be merged with upstream changes, but anything in a
deployment-specific environment directory (etc/kayobe/environments/
may
require manual inspection.
If using the kayobe-env
environment file in kayobe-config
, this should
also be inspected for changes and modified to suit the local Ansible control
host environment if necessary. When ready, source the environment file:
source kayobe-env
Create one or more pull requests with these changes.
Once the configuration has been migrated, it is possible to view the global variables for all hosts:
kayobe configuration dump
The output of this command is a JSON object mapping hosts to their
configuration. The output of the command may be restricted using the
--host
, --hosts
, --var-name
and --dump-facts
options.
The local Kayobe environment should be either recreated or upgraded to use the new release. It may be beneficial to keep a Kayobe environment for the old release in case it is necessary before the uprade begins.
In general it is safer to rebuild an environment than upgrade, but for completeness the following shows how to upgrade an existing local Kayobe environment.
Change to the Kayobe configuration directory:
cd /path/to/src/kayobe-config
Check the status:
git status
Pull down the new branch:
git checkout example/|current_release|
git pull origin example/|current_release|
Activate the Kayobe virtual environment:
source /path/to/venvs/kayobe/bin/activate
Reinstall Kayobe and other dependencies:
pip install --force-reinstall -r requirements.txt
Source the kayobe-env
script:
source kayobe-env [--environment <env>]
Export the Ansible Vault password:
export KAYOBE_VAULT_PASSWORD=$(cat /path/to/vault/password/file)
Next we must upgrade the Ansible control host. Tasks performed here include:
- Install updated Ansible collection and role dependencies from Ansible Galaxy.
- Generate an SSH key if necessary and add it to the current user's authorised keys.
- Upgrade Kolla Ansible locally to the configured version.
To upgrade the Ansible control host:
kayobe control host upgrade
New :ref:`stackhpc-release-train` content should be synced to the local Pulp server. This includes host packages (Deb/RPM) and container images.
To sync host packages:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-repo-sync.yml
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-repo-publish.yml
Once the host package content has been tested in a test/staging environment, it may be promoted to production:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-repo-promote-production.yml
To sync container images:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-container-sync.yml
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-container-publish.yml
Note
The container images are provided by StackHPC Release Train are suitable for most deployments. In this case, this step can be skipped.
In some cases it is necessary to build some or all images locally to apply
customisations. In order to do this it is necessary to set
stackhpc_pulp_sync_for_local_container_build
to true
before
:ref:`syncing container images <sync-rt-package-repos>`.
To build the overcloud images locally and push them to the local Pulp server:
kayobe overcloud container image build --push
It is possible to build a specific set of images by supplying one or more image name regular expressions:
kayobe overcloud container image build --push ironic- nova-api
Pulling container images from the local Pulp server to the control plane hosts can take a considerable time, because images are only synced from Ark to the local Pulp on demand, and there is potentially a large fan-out. Pulling images in advance of the upgrade moves this step out of the maintenance/change window. Consider checking available disk space before pulling:
kayobe overcloud host command run --command "df -h" --show-output --limit controllers[0],compute[0],storage[0]
Then pull the images:
kayobe overcloud container image pull
Kayobe allows us to generate overcloud service configuration in advance, and compare it with the running configuration. This allows us to check for any unexpected changes.
This can take a significant time, and it may be advisable to limit these commands to one of each type of host (controller, compute, storage, etc.). The following commands use a limit including the first host in each of these groups.
Save the old configuration locally.
kayobe overcloud service configuration save --node-config-dir /etc/kolla --output-dir ~/kolla-diff/old --limit controllers[0],compute[0],storage[0] --exclude ironic-agent.initramfs,ironic-agent.kernel
Generate the new configuration to a tmpdir.
kayobe overcloud service configuration generate --node-config-dir /tmp/kolla --kolla-limit controllers[0],compute[0],storage[0]
Save the new configuration locally.
kayobe overcloud service configuration save --node-config-dir /tmp/kolla --output-dir ~/kolla-diff/new --limit controllers[0],compute[0],storage[0] --exclude ironic-agent.initramfs,ironic-agent.kernel
The old and new configuration will be saved to ~/kolla-diff/old
and
~/kolla-diff/new
respectively on the Ansible control host.
Fix up the paths:
cd ~/kolla-diff/new
for i in *; do mv $i/tmp $i/etc; done
cd -
Compare the old & new configuration:
diff -ru ~/kolla-diff/{old,new} > ~/kolla-diff.diff
less ~/kolla-diff.diff
Currently, upgrading the seed hypervisor services is not supported. It may however be necessary to upgrade host packages and some host services.
Consider whether the seed hypervisor needs to be upgraded within or outside of a maintenance/change window.
Note
In case of issues booting up, consider alternative access methods if the hypervisor is also used as the Ansible control host (or runs it in a VM).
Prior to upgrading the seed hypervisor, it may be desirable to upgrade system packages on the seed hypervisor host.
To update all eligible packages, use *
, escaping if necessary:
kayobe seed hypervisor host package update --packages "*"
If the kernel has been upgraded, reboot the seed hypervisor to pick up the change:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l seed-hypervisor
It may be necessary to upgrade some host services:
kayobe seed hypervisor host upgrade
Note that this will not perform full configuration of the host, and will instead perform a targeted upgrade of specific services where necessary.
Performing host configuration is not a formal part of the upgrade process, but it is possible for host configuration to drift over time as new features and other changes are added to Kayobe.
Host configuration, particularly around networking, can lead to loss of network connectivity and other issues if the configuration is not correct. For this reason it is sensible to first run Ansible in "check mode" to see what changes would be applied:
kayobe seed hypervisor host configure --check --diff
When ready to apply the changes:
kayobe seed hypervisor host configure
Consider whether the seed needs to be upgraded within or outside of a maintenance/change window.
Note
In case of issues booting up, consider alternative access methods if the seed is also used as the Ansible control host.
Prior to upgrading the seed, it may be desirable to upgrade system packages on the seed host.
Note that these commands do not affect packages installed in containers, only those installed on the host.
To update all eligible packages, use *
, escaping if necessary:
kayobe seed host package update --packages "*"
If the kernel has been upgraded, reboot the seed to pick up the change:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l seed
Verify that Bifrost, Ironic and Inspector are running as expected:
ssh stack@<seed>
sudo docker exec -it bifrost_deploy bash
systemctl
export OS_CLOUD=bifrost
baremetal node list
baremetal introspection list
exit
exit
Note
It is possible to use prebuilt deployment images. In this case, this step can be skipped.
It is possible to use prebuilt deployment images from the OpenStack hosted
tarballs or another
source. In some cases it may be necessary to build images locally either to
apply local image customisation or to use a downstream version of Ironic Python
Agent (IPA). In order to build IPA images, the ipa_build_images
variable
should be set to True
. To build images locally:
kayobe seed deployment image build
To overwrite existing images, add the --force-rebuild
argument.
It may be necessary to upgrade some host services:
kayobe seed host upgrade
Note that this will not perform full configuration of the host, and will instead perform a targeted upgrade of specific services where necessary.
Performing host configuration is not a formal part of the upgrade process, but it is possible for host configuration to drift over time as new features and other changes are added to Kayobe.
Host configuration, particularly around networking, can lead to loss of network connectivity and other issues if the configuration is not correct. For this reason it is sensible to first run Ansible in "check mode" to see what changes would be applied:
kayobe seed host configure --check --diff
When ready to apply the changes:
kayobe seed host configure
Note
The container images are provided by StackHPC Release Train are suitable for most deployments. In this case, this step can be skipped.
In some cases it is necessary to build some or all images locally to apply
customisations. In order to do this it is necessary to set
stackhpc_pulp_sync_for_local_container_build
to true
before
:ref:`syncing container images <sync-rt-package-repos>`.
To build the seed images locally and push them to the local Pulp server:
kayobe seed container image build --push
Containerised seed services may be upgraded by replacing existing containers with new containers using updated images which have been pulled from the local Pulp registry.
To upgrade the containerised seed services:
kayobe seed service upgrade
Verify that Bifrost, Ironic and Inspector are running as expected:
ssh stack@<seed>
sudo docker exec -it bifrost_deploy bash
systemctl
export OS_CLOUD=bifrost
baremetal node list
baremetal introspection list
exit
exit
Consider whether Wazuh Manager needs to be upgraded within or outside of a maintenance/change window.
Prior to upgrading the Wazuh manager services, it may be desirable to upgrade system packages on the Wazuh manager host.
To update all eligible packages, use *
, escaping if necessary:
kayobe infra vm host package update --packages "*" -l wazuh-manager
If the kernel has been upgraded, reboot the Wazuh Manager to pick up the change:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l wazuh-manager
Verify that Wazuh Manager is functioning correctly by :ref:`logging into the Wazuh UI <wazuh-verification>`.
Performing host configuration is not a formal part of the upgrade process, but it is possible for host configuration to drift over time as new features and other changes are added to Kayobe.
Host configuration, particularly around networking, can lead to loss of network connectivity and other issues if the configuration is not correct. For this reason it is sensible to first run Ansible in "check mode" to see what changes would be applied:
kayobe infra vm host configure --check --diff -l wazuh-manager
When ready to apply the changes:
kayobe infra vm host configure -l wazuh-manager
Run the following playbook to update Wazuh Manager services and configuration:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/wazuh-manager.yml
Verify that Wazuh Manager is functioning correctly by :ref:`logging into the Wazuh UI <wazuh-verification>`.
Consider whether Wazuh Agents need to be upgraded within or outside of a maintenance/change window.
Run the following playbook to update Wazuh Agent services and configuration:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/wazuh-agent.yml
Verify that the agents have conncted to Wazuh Manager correctly by :ref:`logging into the Wazuh UI <wazuh-verification>`.
Consider which of the overcloud upgrade steps need to be performed within or outside of a maintenance/change window.
Prior to upgrading the OpenStack control plane, it may be desirable to upgrade system packages on the overcloud hosts.
Note that these commands do not affect packages installed in containers, only those installed on the host.
In order to avoid downtime, it is important to control how package updates are rolled out. In general, controllers and network hosts should be updated one by one, ideally updating the host with the Virtual IP (VIP) last. For hypervisors it may be possible to update packages in batches of hosts, provided there is sufficient capacity to migrate VMs to other hypervisors.
For each host or batch of hosts, perform the following steps.
If the host is a hypervisor, disable the Nova compute service and drain it of VMs using live migration. If any VMs fail to migrate, they may be cold migrated or powered off:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/nova-compute-{disable,drain}.yml --limit <host>
To update all eligible packages, use *
, escaping if necessary:
kayobe overcloud host package update --packages "*" --limit <host>
Note
Due to a security-related change in the GRUB package on Rocky Linux 9, the operating
system can become unbootable (boot will stop at a grub>
prompt). Remove
the --root-dev-only
option from /boot/efi/EFI/rocky/grub.cfg
after
applying package updates. This will happen automatically as a post hook when
running the kayobe overcloud host package update
command.
If the kernel has been upgraded, reboot the host or batch of hosts to pick up the change:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l <host>
If the host is a hypervisor, enable the Nova compute service.
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/nova-compute-enable.yml --limit <host>
If any VMs were powered off, they may now be powered back on.
Wait for Prometheus alerts and errors in OpenSearch Dashboard to resolve, or address them.
After updating controllers or network hosts, run any appropriate smoke tests.
Once happy that the system has been restored to full health, move onto the next host or batch or hosts.
Prior to upgrading the OpenStack control plane, the overcloud host services should be upgraded:
kayobe overcloud host upgrade
Note that this will not perform full configuration of the host, and will instead perform a targeted upgrade of specific services where necessary.
Performing host configuration is not a formal part of the upgrade process, but it is possible for host configuration to drift over time as new features and other changes are added to Kayobe.
Host configuration, particularly around networking, can lead to loss of network connectivity and other issues if the configuration is not correct. For this reason it is sensible to first run Ansible in "check mode" to see what changes would be applied:
kayobe overcloud host configure --check --diff
When ready to apply the changes, it may be advisable to do so in batches, or at least start with a small number of hosts:
kayobe overcloud host configure --limit <host>
Warning
Take extra care when configuring Ceph hosts. Set the hosts to maintenance mode before reconfiguring them, and unset when done:
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/ceph-enter-maintenance.yml --limit <host>
kayobe overcloud host configure --limit <host>
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/ceph-exit-maintenance.yml --limit <host>
Always reconfigure hosts in small batches or one-by-one. Check the Ceph state after each host configuration. Ensure all warnings and errors are resolved before moving on.
Note
It is possible to use prebuilt deployment images. In this case, this step can be skipped.
It is possible to use prebuilt deployment images from the OpenStack hosted
tarballs or another
source. In some cases it may be necessary to build images locally either to
apply local image customisation or to use a downstream version of Ironic Python
Agent (IPA). In order to build IPA images, the ipa_build_images
variable
should be set to True
. To build images locally:
kayobe overcloud deployment image build
To overwrite existing images, add the --force-rebuild
argument.
Prior to upgrading the OpenStack control plane you should upgrade
the deployment images. If you are using prebuilt images, update
the following variables in etc/kayobe/ipa.yml
accordingly:
ipa_kernel_upstream_url
ipa_kernel_checksum_url
ipa_kernel_checksum_algorithm
ipa_ramdisk_upstream_url
ipa_ramdisk_checksum_url
ipa_ramdisk_checksum_algorithm
Alternatively, you can update the files that the URLs point to. If building the images locally, follow the process outlined in :ref:`building_ironic_deployment_images`.
To get Ironic to use an updated set of overcloud deployment images, you can run:
kayobe baremetal compute update deployment image
This will register the images in Glance and update the deploy_ramdisk
and deploy_kernel
properties of the Ironic nodes.
Before rolling out the update to all nodes, it can be useful to test the image
on a limited subset. To do this, you can use the --baremetal-compute-limit
option. The argument should take the form of an ansible host pattern
which is matched against the Ironic node name.
Containerised control plane services may be upgraded by replacing existing containers with new containers using updated images which have been pulled from a registry or built locally.
If using overcloud Ironic, check whether any ironic nodes are in a wait state:
baremetal node list | grep wait
This will block the upgrade, but may be overridden by setting
ironic_upgrade_skip_wait_check
to true
in
etc/kayobe/kolla/globals.yml
or
etc/kayobe/environments/<env>/kolla/globals.yml
.
To upgrade the containerised control plane services:
kayobe overcloud service upgrade
It is possible to specify tags for Kayobe and/or kolla-ansible to restrict the scope of the upgrade:
kayobe overcloud service upgrade --tags config --kolla-tags keystone
If using Octavia with the Amphora driver, you should :ref:`build a new amphora image <Amphora image>`.
At this point it is recommended to perform a thorough test of the system to catch any unexpected issues. This may include:
- Check Prometheus, OpenSearch Dashboards and Grafana
- Smoke tests
- All applicable tempest tests
- Horizon UI inspection
Prune unused container images:
kayobe overcloud host command run -b --command "docker image prune -a -f"