-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gather SOS reports #19
Conversation
Ceilometer is missing form the list of OpenStack services, so we add it in collection-scripts/common.sh
Trivial change from `/n` to `\n` in the debug instructions.
a796e87
to
e8387ef
Compare
/hold
|
/unhold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the awesome work, I tried this locally and I'm able to properly see the generated sos reports
.
The filtering mechanism is also very useful and it is a good approach driving it through env vars
.
I only have a small comment and a suggested change on the chmod
command which is still failing w/ permission denied
(only for a subset of directories), otherwise looks good!
|
||
Some openstack-must-gather collectors can be configured via environmental | ||
variables to behave differently. For example SOS gathering can be disabled | ||
passing an empty `SOS_SERVICES` environmental variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, so we're going to enable sos-reports by default, and passing SOS_SERVICES=
skips that step.
fi | ||
|
||
# Other services have the component in the service label, eg: nova-api | ||
for os_svc in "${SOS_SERVICES[@]}"; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, and hopefully long term we're going to have a more consistent usage of service
vs component
, which simplifies this part.
collection-scripts/gather_sos
Outdated
fi | ||
|
||
# Ensure write access to the sos reports proc directory so must-gather rsync doesn't fail | ||
chmod +w "${SOS_PATH_NODES}/sosreport-$node/*/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see a lot of errors like:
*** Skipping any contents from this failed directory ***
rsync: [generator] recv_generator: mkdir "/home/stack/must-gather/must-gather.local.3650222047466077379/quay-io-fpantano-must-gather-sha256-edcfdd8649f5def240d27a3a87899cb9bc3732684bb64eaf742219343b6653a6/sos-reports/_all_nodes/sosreport-crc-8cf2w-master-0/proc/9171" failed: Permi
ssion denied (13)
in my environment, that seems to go away if you chmod
like this:
chmod +w -R "${SOS_PATH_NODES}/sosreport-$node/"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will change it.
I believe the source cause is an rsync
bug, because it is not properly handing the permissions.
I tried this in the linked nova adoption PR, we need to get EDPM logs from containers. Unfortunately, CI apparently didn't do speculative merging this time?.. Could we have this merged sooner than later please? update: |
If the adoption jobs are based on ci-framework, we should be able to execute must-gather as part of the execution [1], but I'm not sure that happens.
I suspect the [1] https://github.com/openstack-k8s-operators/ci-framework/blob/main/ci_framework/roles/os_must_gather/tasks/main.yml#L33 |
I don't think this will help with EDPM, because this only gathers SOS reports from the control plane OCP nodes, not EDPM nodes. |
This patch adds SOS gathering for controller nodes so we get all information with the must-gather request and the reports are gathered together instead of needing to got to 4 different places. SOS gathering is done in parallel for all hosts, and they they are stored uncompressed in the must-gather report to avoid having nested compression that makes it more difficult to use it. Since we may not always be interested in gathering SOS reports or we may want just SOS reports of nodes for specific services we have the `SOS_SERVICES` environmental variable that we can use to define the services. To speed things up we don't gather a full SOS report but instead limit the plugins used to `block,cifs,crio,devicemapper,devices,iscsi,lvm2, memory,multipath,nfs,nis,nvme,podman,process,processor,selinux,scsi,udev`. A user can change this using the `SOS_ONLY_PLUGINS` environmental variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
This patch adds SOS gathering for controller nodes so we get all
information with the must-gather request and the reports are gathered
together instead of needing to got to 4 different places.
SOS gathering is done in parallel for all hosts, and they they are
stored uncompressed in the must-gather report to avoid having nested
compression that makes it more difficult to use it.
Since we may not always be interested in gathering SOS reports or we may
want just SOS reports of nodes for specific services we have the
SOS_SERVICES
environmental variable that we can use to define theservices.
To speed things up we don't gather a full SOS report but instead limit
the plugins used to
block,cifs,crio,devicemapper,devices,iscsi,lvm2, memory,multipath,nfs,nis,nvme,podman,process,processor,selinux,scsi,udev
.A user can change this using the
SOS_ONLY_PLUGINS
environmentalvariable.