Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-cell tripleo setup for adoption devel #826

Merged

Conversation

bogdando
Copy link
Contributor

@bogdando bogdando commented May 10, 2024

In order to keep the HW requirments for development
of multi-cell OSP 17.1 adoption for RHOSO 18, provide
a reduced multi-stack footprint
(which is supported in tripleo, yet not in OSP):

undercloud: 1 VM
overcloud: Controller0 ( 1 VM, no HA)
cell1: Compute0, Compute1 (CellController1) ( 2 VMs)
cell2: Compute2+CellController2 (AIO VM host)

No Ceph/HCI support yet, TBD.
Only a fixed number of cells (2 extra cells) supported yet.

Add partial local dev envs support (given VMs preprovisioned):

  • Fix HOME paths to match the default user home from ssh opts
    (when it is 'zuul', the commands fail on the control node,
    when executed from another user name)
  • Add missing repo setup commands and notes from standalone.sh
  • Support RH registry auth env vars
  • Disable undercloud install validations (to allow deployments
    on low storage envs)
  • Fix tripleo network config to let it configure nodes in a local
    setup, where there is no pre-configured networks
  • Unplug External network from roles file as CI infra does not
    provide a gateway for it, so tripleo networking fails ping test

Add EDPM_CONFIGURE_NETWORKING to control tripleo networking
configuration. Disable it for CI jobs. Should be enabled for
local deployments.

Add j2 bool filter support to jinja_render common func.

Required-By: https://review.rdoproject.org/r/c/rdo-jobs/+/53192

JIRA OSPRH-6548

Copy link
Contributor

openshift-ci bot commented May 10, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@bogdando bogdando force-pushed the multi-cell branch 3 times, most recently from e716eef to a4156c6 Compare May 10, 2024 13:27
Copy link

This change depends on a change that failed to merge.

Change https://review.rdoproject.org/r/c/rdo-jobs/+/53192 is needed.

@bogdando bogdando requested a review from marios May 13, 2024 13:06
Copy link
Contributor

@marios marios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please lets discuss this during tomorrow data plane call before proceeding down this path

i want us all to agree on the direction.

i think you are proposing that we will have a 3 compute hci ceph job, and then a 1 cntrol/2 compute for cells (i.e. this job) but we have not had that conversation yet.

we need to be on the same page about the ci direction since we (ci team) are currently working here

cc @cescgina @frenzyfriday @jistr

devsetup/tripleo/overcloud_services _cell.j2 Outdated Show resolved Hide resolved
Copy link

This change depends on a change with an invalid configuration.

Copy link

This change depends on a change with an invalid configuration.

Copy link

This change depends on a change with an invalid configuration.

Copy link

This change depends on a change with an invalid configuration.

Copy link

This change depends on a change with an invalid configuration.

Copy link

This change depends on a change with an invalid configuration.

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging rdoproject.org/rdo-jobs for 53192,14

Copy link

This change depends on a change with an invalid configuration.

Copy link

This change depends on a change with an invalid configuration.

Copy link

This change depends on a change with an invalid configuration.

devsetup/scripts/standalone.sh Show resolved Hide resolved
set -ex
sudo dnf install -y podman python3-tripleoclient util-linux lvm2

sudo hostnamectl set-hostname undercloud.localdomain
sudo hostnamectl set-hostname undercloud.localdomain --transient

cat >\$HOME/nova_noceph.yaml <<__EOF__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just add this file into the rest of the env files instead of creating like that
i mean, what is special about this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a recommended approach in general to provide separate files for different configuration areas, rather than mangling large files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant more like, we are already carrying a bunch of environment files. Fine if you don't want to put this into an existing one but you can just add a new file under the tripleo directory.
With this we have some environments/parameter_defaults coming from files, and others like this one being created by heredoc. I think it just further adds to the complexity of this 'tool'.

devsetup/scripts/tripleo.sh Show resolved Hide resolved
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/cf175bde547947759a40abec7c227605

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 49m 13s
✔️ install-yamls-crc-podified-edpm-baremetal SUCCESS in 1h 26m 11s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 15m 10s
✔️ adoption-standalone-to-crc-ceph-provider SUCCESS in 2h 38m 46s
adoption-standalone-to-crc-no-ceph-provider FAILURE in 57m 00s

@bogdando bogdando force-pushed the multi-cell branch 2 times, most recently from 8fc0450 to f396d69 Compare September 9, 2024 15:36
Copy link
Contributor

@marios marios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting to hear that testing is good and will revisit

# We'll use the NTP_SERVER environmental variable to define the NTP server to use, e.g.:
# export NTP_SERVER=pool.ntp.org

if [ $EDPM_COMPUTE_CELLS -eq 2 ] || [ $EDPM_COMPUTE_CELLS -gt 3 ] || [ $EDPM_COMPUTE_CELLS -eq 0 ] ; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like if EDPM_COMPUTE_CELLS != 3 (or whatever the allowed is i think 3?) then error

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a1a42f0b465843afabe4a2adb778d530

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 20m 31s
✔️ install-yamls-crc-podified-edpm-baremetal SUCCESS in 1h 33m 52s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 15m 25s
✔️ adoption-standalone-to-crc-ceph-provider SUCCESS in 2h 36m 41s
adoption-standalone-to-crc-no-ceph-provider FAILURE in 2h 09m 17s

@bogdando
Copy link
Contributor Author

the regression test is passed the multi-node check https://review.rdoproject.org/r/c/testproject/+/54199 ! Now please help testing it downstream, for networker

In order to keep the HW requirments for development
of multi-cell OSP 17.1 adoption for RHOSO 18, provide
a reduced multi-stack footprint
(which is supported in tripleo, yet not in OSP):

undercloud: 1 VM
overcloud: Controller0 ( 1 VM, no HA)
cell1: Compute0, Compute1 (CellController1) ( 2 VMs)
cell2: Compute2+CellController2 (AIO VM host)

No Ceph/HCI support yet, TBD.
Only a fixed number of cells (2 extra cells) supported yet.

Add partial local dev envs support (given VMs preprovisioned):
* Fix HOME paths to match the default user home from ssh opts
  (when it is 'zuul', the commands fail on the control node,
   when executed from another user name)
* Add missing repo setup commands and notes from standalone.sh
* Support RH registry auth env vars
* Disable undercloud install validations (to allow deployments
  on low storage envs)
* Fix tripleo network config to let it configure nodes in a local
  setup, where there is no pre-configured networks.
* Unplug External network from roles file as CI infra does not
  provide a gateway for it, so tripleo networking fails ping test

Add EDPM_CONFIGURE_NETWORKING to control tripleo networking
configuration. Disable it for CI jobs. Should be enabled for
local deployments.

Add j2 bool filter support to jinja_render common func.

Signed-off-by: Bohdan Dobrelia <[email protected]>
@bogdando
Copy link
Contributor Author

recheck

Copy link
Contributor

@marios marios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devsetup/scripts/standalone.sh Show resolved Hide resolved
set -ex
sudo dnf install -y podman python3-tripleoclient util-linux lvm2

sudo hostnamectl set-hostname undercloud.localdomain
sudo hostnamectl set-hostname undercloud.localdomain --transient

cat >\$HOME/nova_noceph.yaml <<__EOF__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant more like, we are already carrying a bunch of environment files. Fine if you don't want to put this into an existing one but you can just add a new file under the tripleo directory.
With this we have some environments/parameter_defaults coming from files, and others like this one being created by heredoc. I think it just further adds to the complexity of this 'tool'.

devsetup/scripts/tripleo.sh Show resolved Hide resolved
@bogdando
Copy link
Contributor Author

can we merge this now please?

@bogdando
Copy link
Contributor Author

@jistr PTAL

devsetup/scripts/common.sh Show resolved Hide resolved
# We'll use the NTP_SERVER environmental variable to define the NTP server to use, e.g.:
# export NTP_SERVER=pool.ntp.org

if [ $EDPM_COMPUTE_CELLS -eq 2 ] || [ $EDPM_COMPUTE_CELLS -gt 3 ] || [ $EDPM_COMPUTE_CELLS -eq 0 ] ; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if [ $EDPM_COMPUTE_CELLS -ne 1 -a $EDPM_COMPUTE_CELLS -ne 3 ]; then ...

devsetup/scripts/tripleo.sh Show resolved Hide resolved
@fao89
Copy link
Contributor

fao89 commented Sep 12, 2024

/approve

Copy link
Contributor

openshift-ci bot commented Sep 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bogdando, fao89, marios

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fao89
Copy link
Contributor

fao89 commented Sep 12, 2024

/hold
just noticed @olliewalsh comments

@openshift-merge-bot openshift-merge-bot bot merged commit 316ec1e into openstack-k8s-operators:main Sep 12, 2024
5 checks passed
@bogdando
Copy link
Contributor Author

bogdando commented Sep 12, 2024

@olliewalsh I will address your comments in follow up, sorry.

done #909

@@ -15,11 +15,32 @@
# under the License.
set -ex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdando please check slack when you get some time

this breaks the ceph job (your test ran multinode-no-ceph) so please have a look and if we cannot find a good solution today then we can revert and work it out before going again

karelyatin added a commit to karelyatin/data-plane-adoption that referenced this pull request Oct 17, 2024
The default ntp server pool.ntp.org do not work with RH Network,
Fix doc so user set it as per the environment. It was correct
until [1].

[1] openstack-k8s-operators/install_yamls#826
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants