Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gather SOS reports #19

Merged
merged 3 commits into from
Nov 22, 2023
Merged

Gather SOS reports #19

merged 3 commits into from
Nov 22, 2023

Conversation

Akrog
Copy link
Contributor

@Akrog Akrog commented Nov 17, 2023

This patch adds SOS gathering for controller nodes so we get all
information with the must-gather request and the reports are gathered
together instead of needing to got to 4 different places.

SOS gathering is done in parallel for all hosts, and they they are
stored uncompressed in the must-gather report to avoid having nested
compression that makes it more difficult to use it.

Since we may not always be interested in gathering SOS reports or we may
want just SOS reports of nodes for specific services we have the
SOS_SERVICES environmental variable that we can use to define the
services.

To speed things up we don't gather a full SOS report but instead limit
the plugins used to block,cifs,crio,devicemapper,devices,iscsi,lvm2, memory,multipath,nfs,nis,nvme,podman,process,processor,selinux,scsi,udev.
A user can change this using the SOS_ONLY_PLUGINS environmental
variable.

Ceilometer is missing form the list of OpenStack services, so we add it
in collection-scripts/common.sh
Trivial change from `/n` to `\n` in the debug instructions.
@Akrog
Copy link
Contributor Author

Akrog commented Nov 21, 2023

/hold
There seems to be something wrong when rsync tries to create directories locally

rsync: [generator] recv_generator: mkdir "/home/geguileo/ng/openstack-must-gather/must-gather.local.6336842555215595258/image-registry-openshift-image-registry-svc-5000-openshift-wtf-sha256-883a30abacecbdeb4ae92ec82e3759263bcf69eeb36a119f77cd976412c101fd/sos-reports/_all_nodes/sosreport-crc-8cf2w-master-0/proc/13652" failed: Permission denied (13)

@Akrog
Copy link
Contributor Author

Akrog commented Nov 21, 2023

/unhold

Copy link
Contributor

@fmount fmount left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the awesome work, I tried this locally and I'm able to properly see the generated sos reports.
The filtering mechanism is also very useful and it is a good approach driving it through env vars.
I only have a small comment and a suggested change on the chmod command which is still failing w/ permission denied (only for a subset of directories), otherwise looks good!


Some openstack-must-gather collectors can be configured via environmental
variables to behave differently. For example SOS gathering can be disabled
passing an empty `SOS_SERVICES` environmental variable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, so we're going to enable sos-reports by default, and passing SOS_SERVICES= skips that step.

fi

# Other services have the component in the service label, eg: nova-api
for os_svc in "${SOS_SERVICES[@]}"; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, and hopefully long term we're going to have a more consistent usage of service vs component, which simplifies this part.

fi

# Ensure write access to the sos reports proc directory so must-gather rsync doesn't fail
chmod +w "${SOS_PATH_NODES}/sosreport-$node/*/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see a lot of errors like:

*** Skipping any contents from this failed directory ***
rsync: [generator] recv_generator: mkdir "/home/stack/must-gather/must-gather.local.3650222047466077379/quay-io-fpantano-must-gather-sha256-edcfdd8649f5def240d27a3a87899cb9bc3732684bb64eaf742219343b6653a6/sos-reports/_all_nodes/sosreport-crc-8cf2w-master-0/proc/9171" failed: Permi
ssion denied (13)

in my environment, that seems to go away if you chmod like this:

chmod +w -R "${SOS_PATH_NODES}/sosreport-$node/"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will change it.
I believe the source cause is an rsync bug, because it is not properly handing the permissions.

@bogdando
Copy link

bogdando commented Nov 22, 2023

I tried this in the linked nova adoption PR, we need to get EDPM logs from containers. Unfortunately, CI apparently didn't do speculative merging this time?.. Could we have this merged sooner than later please?

update:
actually rdo-jobs CI did execute speculative merging, as I can see ZUUL_CHANGES=openstack-k8s-operators/openstack-must-gather:main:refs/changes/19/19/e8387efccb40fb8496e52bda331c219311291272 ... so this patch probably didn't work for me as adoption CI doesn't use mustgather yet

@fmount
Copy link
Contributor

fmount commented Nov 22, 2023

I tried this in the linked nova adoption PR, we need to get EDPM logs from containers. Unfortunately, CI apparently didn't do speculative merging this time?.. Could we have this merged sooner than later please?

If the adoption jobs are based on ci-framework, we should be able to execute must-gather as part of the execution [1], but I'm not sure that happens.
As I mentioned in my last comment, the PR looks good and I'm willing to merge it as soon as possible.

update: actually rdo-jobs CI did execute speculative merging, as I can see ZUUL_CHANGES=openstack-k8s-operators/openstack-must-gather:main:refs/changes/19/19/e8387efccb40fb8496e52bda331c219311291272 ... so this patch probably didn't work for me as adoption CI doesn't use mustgather yet

I suspect the ci-framework execution uses [2], hence we need to merge/build before you can get the results you need there :/

[1] https://github.com/openstack-k8s-operators/ci-framework/blob/main/ci_framework/roles/os_must_gather/tasks/main.yml#L33
[2] https://github.com/openstack-k8s-operators/ci-framework/blob/main/ci_framework/roles/os_must_gather/defaults/main.yml#L20

@Akrog
Copy link
Contributor Author

Akrog commented Nov 22, 2023

I tried this in the linked nova adoption PR, we need to get EDPM logs from containers. Unfortunately, CI apparently didn't do speculative merging this time?.. Could we have this merged sooner than later please?

I don't think this will help with EDPM, because this only gathers SOS reports from the control plane OCP nodes, not EDPM nodes.

This patch adds SOS gathering for controller nodes so we get all
information with the must-gather request and the reports are gathered
together instead of needing to got to 4 different places.

SOS gathering is done in parallel for all hosts, and they they are
stored uncompressed in the must-gather report to avoid having nested
compression that makes it more difficult to use it.

Since we may not always be interested in gathering SOS reports or we may
want just SOS reports of nodes for specific services we have the
`SOS_SERVICES` environmental variable that we can use to define the
services.

To speed things up we don't gather a full SOS report but instead limit
the plugins used to `block,cifs,crio,devicemapper,devices,iscsi,lvm2,
memory,multipath,nfs,nis,nvme,podman,process,processor,selinux,scsi,udev`.
A user can change this using the `SOS_ONLY_PLUGINS` environmental
variable.
Copy link
Contributor

@fmount fmount left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@fmount fmount merged commit d86ed1f into openstack-k8s-operators:main Nov 22, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants