Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNM - Cleanup kuttl tests #386

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

fmount
Copy link
Collaborator

@fmount fmount commented Feb 4, 2025

The existing kuttl tests match the whole Manila CR spec (including sub resources spec), including parameters that are not meaningful to make the test pass or not.
This patch removes a lot of fields that are not required in the main assertion. For example, matching the status is often sufficient to assert the status of the reconciliation, conditions, and to check if a resource has been properly deployed; images are not tested via kuttl, and we have jobs testing both the container image injection and a minor update with custom images.
This is the first step to reorganize kuttl tests into independent test suites.

I might need to hold-the-node for test purposes

@openshift-ci openshift-ci bot requested a review from abays February 4, 2025 21:26
Copy link
Contributor

openshift-ci bot commented Feb 4, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fmount

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fmount
Copy link
Collaborator Author

fmount commented Feb 5, 2025

/test manila-operator-build-deploy-kuttl

1 similar comment
@fmount
Copy link
Collaborator Author

fmount commented Feb 5, 2025

/test manila-operator-build-deploy-kuttl

@fmount
Copy link
Collaborator Author

fmount commented Feb 5, 2025

This is interesting, from the existing events log we can see:

7m7s        Normal    Created                           pod/manila-db-sync-khhvz                              Created container manila-db-sync
7m6s        Normal    Started                           pod/manila-db-sync-khhvz                              Started container manila-db-sync
3m31s       Warning   BackOff                           pod/manila-db-sync-khhvz                              Back-off restarting failed container manila-db-sync in pod manila-db-sync-khhvz_manila-kuttl-tests(22196429-d178-46c0-93c9-978093da2d83)
2m49s       Warning   BackoffLimitExceeded              job/manila-db-sync                                    Job has reached the specified backoff limit
2m49s       Normal    SuccessfulDelete                  job/manila-db-sync                                    Deleted pod: manila-db-sync-khhvz
2m49s       Normal    SuccessfulCreate                  job/manila-db-sync                                    Created pod: manila-db-sync-qfqxj
2m48s       Normal    AddedInterface                    pod/manila-db-sync-qfqxj                              Add eth0 [10.128.1.26/23] from ovn-kubernetes
2m48s       Normal    Pulled                            pod/manila-db-sync-qfqxj                              Container image "quay.io/podified-antelope-centos9/openstack-manila-api@sha256:16c96f2d5e0ec6fc9472cf55ed20c8241fa0a7c5598c607412ce25999d863126" already present on machine
2m48s       Normal    Created                           pod/manila-db-sync-qfqxj                              Created container manila-db-sync
2m48s       Normal    Started                           pod/manila-db-sync-qfqxj                              Started container manila-db-sync

It succeed at some point, but it's not clear why db-sync fails at the beginning (this seem a consistent behavior).

@fmount fmount force-pushed the kuttl-cleanup branch 2 times, most recently from 2f432e3 to 73f4f48 Compare February 5, 2025 08:21
@fmount
Copy link
Collaborator Author

fmount commented Feb 5, 2025

The bad news is that I'm not able to reproduce the same error locally, where everything passes as expected:

--- PASS: kuttl (223.55s)
    --- PASS: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/manila-basic (105.73s)
        --- PASS: kuttl/harness/manila-tls (47.28s)
        --- PASS: kuttl/harness/manila-multibackend (70.53s)

@fmount fmount force-pushed the kuttl-cleanup branch 4 times, most recently from 47f4236 to a96c64b Compare February 5, 2025 16:24
The existing kuttl tests match the whole Manila CR spec, including
parameters that are not meaninful to make the test pass or not.
This patch removes a lot of fields that are not required in the main
assertion. For example, matching the status is often sufficient to
assert the status of the reconciliation, conditions, and to check if
a resource has been properly deployed.

Signed-off-by: Francesco Pantano <[email protected]>
@fmount fmount changed the title Cleanup kuttl tests DNM - Cleanup kuttl tests Feb 5, 2025
@fmount
Copy link
Collaborator Author

fmount commented Feb 6, 2025

/test manila-operator-build-deploy-kuttl

Signed-off-by: Francesco Pantano <[email protected]>
Copy link
Contributor

openshift-ci bot commented Feb 6, 2025

@fmount: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/manila-operator-build-deploy-tempest e7c94f3 link true /test manila-operator-build-deploy-tempest
ci/prow/manila-operator-build-deploy-kuttl e7c94f3 link true /test manila-operator-build-deploy-kuttl

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@fmount
Copy link
Collaborator Author

fmount commented Feb 6, 2025

I think now I have a more clear idea of what's going on.
From events, we don't see a manila-db-sync pod because after 5 minutes (the time we need to hit the timeout) it gets deleted, and must-gather is not able to catch the logs from something that has been removed. Therefore we lose track of the generated error.
I jumped into a CI environment, and I manually triggered a db-sync job (where combined-ca-bundle secret is mounted), and I see the following failure from the generated Pod:

[zuul@controller ~]$ oc logs manila-db-sync-ca-wshz6
Could not load 'http': [X509] PEM lib (_ssl.c:4311)
Could not load 'https': [X509] PEM lib (_ssl.c:4311)
2025-02-06 13:56:07.685 1 CRITICAL manila [-] Unhandled error: ssl.SSLError: [X509] PEM lib (_ssl.c:4311)
2025-02-06 13:56:07.685 1 ERROR manila Traceback (most recent call last):
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/bin/manila-manage", line 10, in <module>
2025-02-06 13:56:07.685 1 ERROR manila     sys.exit(main())
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/manila/cmd/manage.py", line 544, in main
2025-02-06 13:56:07.685 1 ERROR manila     fn(*fn_args)
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/manila/cmd/manage.py", line 205, in sync
2025-02-06 13:56:07.685 1 ERROR manila     return migration.upgrade(version)
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/manila/db/migration.py", line 28, in upgrade
2025-02-06 13:56:07.685 1 ERROR manila     return IMPL.upgrade(version)
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/manila/utils.py", line 161, in __getattr__
2025-02-06 13:56:07.685 1 ERROR manila     backend = self.__get_backend()
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/manila/utils.py", line 156, in __get_backend
2025-02-06 13:56:07.685 1 ERROR manila     self.__backend = __import__(name, None, None, fromlist)
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/manila/db/migrations/alembic/migration.py", line 22, in <module>
2025-02-06 13:56:07.685 1 ERROR manila     from manila.db.sqlalchemy import api as db_api
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/manila/db/sqlalchemy/api.py", line 61, in <module>
2025-02-06 13:56:07.685 1 ERROR manila     osprofiler_sqlalchemy = importutils.try_import('osprofiler.sqlalchemy')
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/oslo_utils/importutils.py", line 103, in try_import
2025-02-06 13:56:07.685 1 ERROR manila     return import_module(import_str)
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/oslo_utils/importutils.py", line 73, in import_module
2025-02-06 13:56:07.685 1 ERROR manila     __import__(import_str)
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/osprofiler/sqlalchemy.py", line 21, in <module>
2025-02-06 13:56:07.685 1 ERROR manila     from osprofiler import profiler
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/osprofiler/profiler.py", line 27, in <module>
2025-02-06 13:56:07.685 1 ERROR manila     from osprofiler import notifier
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/osprofiler/notifier.py", line 18, in <module>
2025-02-06 13:56:07.685 1 ERROR manila     from osprofiler.drivers import base
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/osprofiler/drivers/__init__.py", line 4, in <module>
2025-02-06 13:56:07.685 1 ERROR manila     from osprofiler.drivers import loginsight  # noqa
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/osprofiler/drivers/loginsight.py", line 26, in <module>
2025-02-06 13:56:07.685 1 ERROR manila     import requests
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/requests/__init__.py", line 121, in <module>
2025-02-06 13:56:07.685 1 ERROR manila     from .api import request, get, head, post, patch, put, delete, options
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/requests/api.py", line 13, in <module>
2025-02-06 13:56:07.685 1 ERROR manila     from . import sessions
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/requests/sessions.py", line 28, in <module>
2025-02-06 13:56:07.685 1 ERROR manila     from .adapters import HTTPAdapter
2025-02-06 13:56:07.685 1 ERROR manila   File "/usr/lib/python3.9/site-packages/requests/adapters.py", line 60, in <module>
2025-02-06 13:56:07.685 1 ERROR manila     _preloaded_ssl_context.load_verify_locations(
2025-02-06 13:56:07.685 1 ERROR manila ssl.SSLError: [X509] PEM lib (_ssl.c:4311)
2025-02-06 13:56:07.685 1 ERROR manila

Clearly, after a few minutes (when we reach backoff limit=6, which is specified in the job), the pod is is deleted, and we get:

72s         Normal    Started                pod/keystone-cron-28980841-cv7nx        Started container keystone-cron
72s         Normal    Created                pod/keystone-cron-28980841-cv7nx        Created container keystone-cron
72s         Normal    Pulled                 pod/keystone-cron-28980841-cv7nx        Container image "quay.io/podified-antelope-centos9/openstack-keystone@sha256:288facda8739c3873ad3fa5f266fefec3718997fe3d498b1
ad8778bc18fdca89" already present on machine
72s         Normal    AddedInterface         pod/keystone-cron-28980841-cv7nx        Add eth0 [10.217.0.180/23] from ovn-kubernetes
67s         Normal    SawCompletedJob        cronjob/keystone-cron                   Saw completed job: keystone-cron-28980841, status: Complete
67s         Normal    SuccessfulDelete       cronjob/keystone-cron                   Deleted job keystone-cron-28980661
67s         Normal    Completed              job/keystone-cron-28980841              Job completed
66s         Warning   BackOff                pod/manila-db-sync-ca-wshz6             Back-off restarting failed container manila-db-sync-ca in pod manila-db-sync-ca-wshz6_manila-kuttl-tests(f59c59a7-f308-445b-8
2a5-1272c4acb9bf)
14s         Normal    SuccessfulDelete       job/manila-db-sync-ca                   Deleted pod: manila-db-sync-ca-wshz6
14s         Warning   BackoffLimitExceeded   job/manila-db-sync-ca                   Job has reached the specified backoff limit

and we get no logs.

@fmount
Copy link
Collaborator Author

fmount commented Feb 6, 2025

Looks like the problem is in the image used by the job:

Working image: [quay.io/podified-antelope-centos9/openstack-manila-api:7ccfacec5b3ef5b693dba49c52db5ad08bcbbc5668f0b37bb03fd54147cbdd6d](http://quay.io/podified-antelope-centos9/openstack-manila-api:7ccfacec5b3ef5b693dba49c52db5ad08bcbbc5668f0b37bb03fd54147cbdd6d)
Broken image: [quay.io/podified-antelope-centos9/openstack-manila-api:current-podified](http://quay.io/podified-antelope-centos9/openstack-manila-api:current-podified)

and this explains why we had working tests until 3 days ago. In my local environment I get by default an image based on the working tag, and that's why I wasn't able to reproduce it.

@fmount
Copy link
Collaborator Author

fmount commented Feb 6, 2025

After doing a few more tests, we can see that after downgrading  python3-requests db sync worked again:

sh-5.1#  dnf downgrade python3-requests
CentOS Stream 9 - BaseOS                                                                                                                                                                                      2.8 MB/s | 8.4 MB     00:02
CentOS Stream 9 - AppStream                                                                                                                                                                                   2.8 MB/s |  22 MB     00:07
CentOS Stream 9 - Extras packages                                                                                                                                                                              40 kB/s |  20 kB     00:00
Dependencies resolved.
=============================================================================================================================================================================================================================================
=
 Package                                                        Architecture                                         Version                                                       Repository                                            Size
=============================================================================================================================================================================================================================================
=
Downgrading:
 python3-requests                                               noarch                                               2.25.1-8.el9                                                  baseos                                               125 k
...
...
Downgraded:
  python3-requests-2.25.1-8.el9.noarch
Complete!
sh-5.1#
sh-5.1#
sh-5.1# ls /etc/manila/
api-paste.ini  manila.conf    manila.conf.d/ rootwrap.conf
sh-5.1# ls /etc/manila/
api-paste.ini  manila.conf    manila.conf.d/ rootwrap.conf
sh-5.1# ls /etc/manila/manila.conf.d/
00-config.conf  02-config.conf
sh-5.1# manila-manage --config-dir /etc/manila/manila.conf.d/ db sync
2025-02-06 15:49:16.479 69 DEBUG manila.utils [-] backend <module 'manila.db.migrations.alembic.migration' from '/usr/lib/python3.9/site-packages/manila/db/migrations/alembic/migration.py'> __get_backend /usr/lib/python3.9/site-packages/manila/utils.py:157
2025-02-06 15:49:16.503 69 DEBUG oslo_db.sqlalchemy.engines [-] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python3.9/site-packages/oslo_db/sqlalchemy/engines.py:335
2025-02-06 15:49:16.508 69 INFO alembic.runtime.migration [-] Context impl MySQLImpl.
2025-02-06 15:49:16.509 69 INFO alembic.runtime.migration [-] Will assume non-transactional DDL.

@fmount
Copy link
Collaborator Author

fmount commented Feb 7, 2025

Kuttl blocked by this bug: https://issues.redhat.com/browse/RHEL-78362

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant