Skip to content

Commit c29c9fb

Browse files
authored
Parallelization (#21)
* Extract shellcheck configuration from Makefile This patch extracts the shellcheck configuration from the Makefile where we are adding things in the command line invocation and moves them to the .shellcheckrc file. This patch also adds the location of the sources to the configuration so we no longer need to add the "shellcheck disable=SC1091" directive before sourcing files to prevent errors. Some editors that do not use the .shellcheckrc may still report the SC1091 error on those lines, but the `make check` will not, since it uses the .shellcheckrc file. * New parallel processing mechanism This patch adds a new parallel processing mechanism with 2 functions: - run_gb: To run commands in background - wait_bg: Wait for backgroup commands to complete The number of simultaneous concurrent processes are controlled by the environmental variable CONCURRENCY, which defaults to 5. The patch moves the SOS report gathering that is already running things in the backgroud to use this new mechanism. To ensure that the concurrency is honored we need to source the gather_sos file instead of calling it from the command line, so the gather_sos has been modified to support sourcing and calling mechanisms. The calling mechanism is useful when debugging things. * Parallelize gather_services_status Move the gather_services_status to the new parallel processing mechanism by sourcing the file from gather_run instead of calling it and adapting its code to support both sourcing and calling mechanisms. The calling mechanism is useful when debugging things. * Parallelize gather_network Move the gather_network to the new parallel processing mechanism by sourcing the file from gather_run instead of calling it and adapting its code to support both sourcing and calling mechanisms. The calling mechanism is useful when debugging things. * Parallelize gather_nodes Move the gather_nodes to the new parallel processing mechanism by sourcing the file from gather_run instead of calling it and adapting its code to support both sourcing and calling mechanisms. The calling mechanism is useful when debugging things. * Parallelize gather_apiservices Move the gather_apiservices to the new parallel processing mechanism by sourcing the file from gather_run instead of calling it and adapting its code to support both sourcing and calling mechanisms. The calling mechanism is useful when debugging things. * Parallelize gather_crds Move the gather_crds to the new parallel processing mechanism by sourcing the file from gather_run instead of calling it and adapting its code to support both sourcing and calling mechanisms. The calling mechanism is useful when debugging things. * Parallelize gather_crs Move the gather_crs to the new parallel processing mechanism by sourcing the file from gather_run instead of calling it and adapting its code to support both sourcing and calling mechanisms. The calling mechanism is useful when debugging things. * Parallelize gather_webhooks Move the gather_webhooks to the new parallel processing mechanism by sourcing the file from gather_run instead of calling it and adapting its code to support both sourcing and calling mechanisms. The calling mechanism is useful when debugging things. * Parallelize gather_trigger_gmr Move the gather_trigger_gmr to the new parallel processing mechanism by sourcing the file from gather_run instead of calling it and adapting its code to support both sourcing and calling mechanisms. The calling mechanism is useful when debugging things. * Parallelize gather_services_cm Move the gather_services_cm to the new parallel processing mechanism by sourcing the file from gather_run instead of calling it and adapting its code to support both sourcing and calling mechanisms. The calling mechanism is useful when debugging things. * Parallelize gather_secrets Move the gather_secrets to the new parallel processing mechanism by sourcing the file from gather_run instead of calling it and adapting its code to support both sourcing and calling mechanisms. The calling mechanism is useful when debugging things. * Parallelize gather_sub Move the gather_sub to the new parallel processing mechanism by sourcing the file from gather_run instead of calling it and adapting its code to support both sourcing and calling mechanisms. The calling mechanism is useful when debugging things. * Parallelize gather_ctlplane_resources Move the gather_ctlplane_resources to the new parallel processing mechanism by sourcing the file from gather_run instead of calling it and adapting its code to support both sourcing and calling mechanisms. The calling mechanism is useful when debugging things. This patch also removes the nested loop that made multiple `oc` requests, now it makes a single request to gather the pod-node-previouslog tuple of information. It also fixes a bug that I introduced in an earlier patch regarding the gathering of previous container logs. * Reorder sos gathering to speed things up Do the gathering of the SOS reports at the beginning of the must-gather so it has a better parallelization with the rest of the tasks. * Update README.md with the parallelization bits This patch updates the README.md to mention the `CONCURRENCY` environmental variable that can be used to control the new parallelization functionality. Before parallelizing the work the timing of a must-gather in my single node CRC deployment took: - 13m35s without SOS reports - 15m49s with SOS reports After this parallelization effort it takes - 7m9s without SOS report - 7m40s with SOS report
1 parent d86ed1f commit c29c9fb

19 files changed

+375
-216
lines changed

.shellcheckrc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Look for 'source'd files relative to the checked script
2+
source-path=SCRIPTDIR
3+
external-sources=SCRIPTDIR
4+
5+
disable=SC2016,SC2006,SC2140,SC2086

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ endif
99
build: check-image podman-build podman-push ## Build and push the must-gather image
1010

1111
check: ## Run sanity check against the script collection
12-
shellcheck -e SC2016 -e SC2006 -e SC2140 -e SC2086 collection-scripts/*
12+
shellcheck collection-scripts/*
1313

1414
pytest: ## Run sanity check against python scripts in pydir
1515
tox -c pyscripts/tox.ini

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,9 @@ oc adm must-gather --image=quay.io/openstack-k8s-operators/openstack-must-gather
3838

3939
This is the list of available environmental variables:
4040

41+
- `CONCURRENCY`: Must gather runs many operations, so to speed things up we run
42+
them in parallel with a concurrency of 5 by default. Users can change this
43+
environmental variable to adjust to its needs.
4144
- `SOS_SERVICES`: Comma separated list of services to gather SOS reports from.
4245
Empty string skips sos report gathering. Eg: `cinder,glance`. Defaults to all
4346
of them.

collection-scripts/bg.sh

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
#!/bin/bash
2+
3+
CONCURRENCY=${CONCURRENCY:-5}
4+
5+
# Function to run commands in background without exceeding $CONCURRENCY
6+
# processes in parallel.
7+
# The recommendation is to use this function at the deepest level that can be
8+
# parallelized and not at the highest.
9+
# For normal commands:
10+
# run_bg echo hola
11+
# For commands that run multiple commands we can play with strings:
12+
# run_bg 'sleep 10 && echo hola'
13+
# run_bg sleep 10 '&& echo hola'
14+
# run_bg sleep 10 '&&' echo hola
15+
# run_bg echo hola '>' myfile.txt
16+
#
17+
# For now these methods ignore errors on the calls that are made in the
18+
# background.
19+
20+
function run_bg {
21+
while [[ $(jobs -r | wc -l) -ge $CONCURRENCY ]]; do
22+
wait -n
23+
done
24+
25+
# Cannot use the alternative suggested by SC2294 which is just "$@"&
26+
# because that doesn't accomplish what we want, as it executes the first
27+
# element as the command and the rest as its parameters, so it cannot run
28+
# multiple commands, use pipes, redirect...
29+
# shellcheck disable=SC2294
30+
eval "$@"&
31+
# Return the new process' PID
32+
return $!
33+
}
34+
35+
36+
# Waits for all background tasks to complete or just for a list of PIDs
37+
# Disable SC2120 in this to prevent SC2119 when called without the optional PIDs
38+
# shellcheck disable=SC2120
39+
function wait_bg {
40+
# When we receive a list of PIDs those may be already finished, and we'll
41+
# get an error complaining those are not children
42+
wait -f "$@" 2>/dev/null
43+
return 0
44+
}

collection-scripts/common.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
#!/bin/bash
22

3+
source "${DIR_NAME}/bg.sh"
4+
35
export BASE_COLLECTION_PATH="${BASE_COLLECTION_PATH:-/must-gather}"
46
declare -a DEFAULT_NAMESPACES=(
57
"openstack"
@@ -81,7 +83,7 @@ function get_resources {
8183
mkdir -p "${NAMESPACE_PATH}"/"$NS"/"$resource"
8284
for res in $(oc -n "$NS" get "$resource" -o custom-columns=":metadata.name"); do
8385
echo "Dump $resource: $res";
84-
/usr/bin/oc -n "$NS" get "$resource" "$res" -o yaml > "${NAMESPACE_PATH}"/"$NS"/"$resource"/"$res".yaml
86+
run_bg /usr/bin/oc -n "$NS" get "$resource" "$res" -o yaml '>' "${NAMESPACE_PATH}/${NS}/${resource}/${res}.yaml"
8587
done
8688
}
8789

collection-scripts/gather_apiservices

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
#!/bin/bash
22

3-
# load shared functions and data
4-
DIR_NAME=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
5-
# shellcheck disable=SC1091
6-
source "${DIR_NAME}/common.sh"
3+
if [[ -z "$DIR_NAME" ]]; then
4+
CALLED=1
5+
DIR_NAME=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
6+
source "${DIR_NAME}/common.sh"
7+
fi
78

89
# Resource list
910
resources=()
@@ -16,7 +17,7 @@ done
1617

1718
for resource in "${resources[@]}"; do
1819
mkdir -p "$BASE_COLLECTION_PATH"/apiservices/
19-
/usr/bin/oc get apiservice "${resource}" -o yaml > "${BASE_COLLECTION_PATH}/apiservices/${resource}.yaml"
20+
run_bg /usr/bin/oc get apiservice "${resource}" -o yaml '>' "${BASE_COLLECTION_PATH}/apiservices/${resource}.yaml"
2021
done
2122

22-
exit 0
23+
[[ $CALLED -eq 1 ]] && wait_bg

collection-scripts/gather_crds

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
#!/bin/bash
22

3+
if [[ -z "$DIR_NAME" ]]; then
4+
CALLED=1
5+
DIR_NAME=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
6+
source "${DIR_NAME}/bg.sh"
7+
fi
8+
9+
310
# Resource list
411
resources=()
512

@@ -8,9 +15,10 @@ do
815
resources+=("crd/$i")
916
done
1017

18+
echo "Gathering CRDs"
1119
# Run the collection of resources using must-gather
1220
for resource in "${resources[@]}"; do
13-
/usr/bin/oc adm inspect --dest-dir must-gather "${resource}"
21+
run_bg /usr/bin/oc adm inspect --dest-dir must-gather "${resource}"
1422
done
1523

16-
exit 0
24+
[[ $CALLED -eq 1 ]] && wait_bg

collection-scripts/gather_crs

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
11
#!/bin/bash
22

3-
DIR_NAME=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
4-
# shellcheck disable=SC1091
5-
source "${DIR_NAME}/common.sh"
3+
# When called from the shell directly
4+
if [[ -z "$DIR_NAME" ]]; then
5+
CALLED=1
6+
DIR_NAME=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
7+
source "${DIR_NAME}/common.sh"
8+
fi
69

710
# Resource list
811
resources=()
@@ -18,6 +21,7 @@ done
1821
resources+=("baremetalhosts.metal3.io")
1922

2023
# we use nested loops to nicely output objects partitioned per namespace, kind
24+
echo "Gathering CRs"
2125
for resource in "${resources[@]}"; do
2226
/usr/bin/oc get "${resource}" --all-namespaces -o custom-columns=NAME:.metadata.name,NAMESPACE:.metadata.namespace --no-headers 2> /dev/null | \
2327
while read -r ocresource; do
@@ -26,13 +30,13 @@ for resource in "${resources[@]}"; do
2630
if [ -z "${ocproject}" ]||[ "${ocproject}" == "<none>" ]; then
2731
object_collection_path=${BASE_COLLECTION_PATH}/cluster-scoped-resources/${resource}
2832
mkdir -p "${object_collection_path}"
29-
/usr/bin/oc get "${resource}" -o yaml "${ocobject}" > "${object_collection_path}/${ocobject}.yaml"
33+
run_bg /usr/bin/oc get "${resource}" -o yaml "${ocobject}" '>' "${object_collection_path}/${ocobject}.yaml"
3034
else
3135
object_collection_path=${BASE_COLLECTION_PATH}/namespaces/${ocproject}/crs/${resource}
3236
mkdir -p "${object_collection_path}"
33-
/usr/bin/oc get "${resource}" -n "${ocproject}" -o yaml "${ocobject}" > "${object_collection_path}/${ocobject}.yaml"
37+
run_bg /usr/bin/oc get "${resource}" -n "${ocproject}" -o yaml "${ocobject}" '>' "${object_collection_path}/${ocobject}.yaml"
3438
fi
3539
done
3640
done
3741

38-
exit 0
42+
[[ $CALLED -eq 1 ]] && wait_bg
Lines changed: 51 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,59 @@
11
#!/bin/bash
22

3-
# load shared functions and data
4-
DIR_NAME=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
5-
# shellcheck disable=SC1091
6-
source "${DIR_NAME}/common.sh"
7-
8-
NS="$1"
9-
if [ -z "$NS" ]; then
10-
echo "No namespace passed, using the default one"
11-
NS=openstack
3+
# load shared functions and data when not sourced
4+
if [[ -z "$DIR_NAME" ]]; then
5+
CALLED=1
6+
DIR_NAME=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
7+
source "${DIR_NAME}/common.sh"
128
fi
139

14-
# Only get resources if the namespace exists
15-
if ! check_namespace "${NS}"; then
16-
exit 0
17-
fi
1810

19-
# Get the view of the current namespace related resources, including pods
20-
mkdir -p "${NAMESPACE_PATH}"/"${NS}"
21-
/usr/bin/oc -n "${NS}" get all > "${NAMESPACE_PATH}"/"${NS}"/all_resources.log
22-
/usr/bin/oc -n "${NS}" get events > "${NAMESPACE_PATH}"/"${NS}"/events.log
23-
/usr/bin/oc -n "${NS}" get pvc > "${NAMESPACE_PATH}"/"${NS}"/pvc.log
24-
/usr/bin/oc -n "${NS}" get nad -o yaml > "${NAMESPACE_PATH}"/"${NS}"/nad.log
25-
26-
# Get pods and the associated logs
27-
for p in $(oc -n "$NS" get pods -o custom-columns=":metadata.name"); do
28-
echo "Dump logs for pod: $p";
29-
mkdir -p "${NAMESPACE_PATH}"/"$NS"/pods/"$p"/logs
30-
# describe pod
31-
/usr/bin/oc -n "$NS" describe pod "$p" > "${NAMESPACE_PATH}/${NS}/pods/${p}/${p}-describe"
32-
# get logs for each of the individual containers
33-
containers=`/usr/bin/oc -n $NS get pod $p -o jsonpath='{.spec.containers[*].name}'`
34-
for c in $containers; do
35-
/usr/bin/oc -n "$NS" logs "$p" -c "$c" > "${NAMESPACE_PATH}/${NS}/pods/${p}/logs/${c}.log";
36-
done
37-
# dump --previous logs for all the terminated containers of the pod
38-
cur=$(/usr/bin/oc -n "$NS" get pods "$p" -o jsonpath="{.status.containerStatuses[*].lastState.terminated}")
39-
for c in $cur; do
40-
/usr/bin/oc -n "$NS" logs "$p" -c "$c" > "${NAMESPACE_PATH}/${NS}/pods/${p}/logs/${c}-previous.log";
11+
function gather_ctlplane_resources {
12+
local NS="$1"
13+
# Only get resources if the namespace exists
14+
if ! check_namespace "${NS}"; then
15+
return
16+
fi
17+
18+
# Get the view of the current namespace related resources, including pods
19+
mkdir -p "${NAMESPACE_PATH}"/"${NS}"
20+
run_bg /usr/bin/oc -n "${NS}" get all '>' "${NAMESPACE_PATH}/${NS}/all_resources.log"
21+
run_bg /usr/bin/oc -n "${NS}" get events '>' "${NAMESPACE_PATH}/${NS}/events.log"
22+
run_bg /usr/bin/oc -n "${NS}" get pvc '>' "${NAMESPACE_PATH}/${NS}/pvc.log"
23+
run_bg /usr/bin/oc -n "${NS}" get nad -o yaml '>' "${NAMESPACE_PATH}/${NS}/nad.log"
24+
25+
# We make a single request to get lines in the form <pod> <container> <crash_status>
26+
data=$(oc -n "$NS" get pod -o go-template='{{range $indexp,$pod := .items}}{{range $index,$element := $pod.status.containerStatuses}}{{printf "%s %s" $pod.metadata.name $element.name}} {{ if ne $element.lastState.terminated nil }}{{ printf "%s" $element.lastState.terminated }}{{ end }}{{ printf "\n"}}{{end}}{{end}}')
27+
while read -r pod container crash_status; do
28+
echo "Dump logs for ${container} from ${pod} pod";
29+
pod_dir="${NAMESPACE_PATH}/${NS}/pods/${pod}"
30+
log_dir="${pod_dir}/logs"
31+
if [ ! -d "$log_dir" ]; then
32+
mkdir -p "$log_dir"
33+
# describe pod
34+
run_bg oc -n "$NS" describe pod "$pod" '>' "${pod_dir}/${pod}-describe"
35+
fi
36+
run_bg oc -n "$NS" logs "$pod" -c "$container" '>' "${log_dir}/${container}.log"
37+
if [[ -n "$crash_status" ]]; then
38+
run_bg oc -n "$NS" logs "$pod" -c "$container" --previous '>' "${log_dir}/${container}-previous.log";
39+
fi
40+
done <<< "$data"
41+
42+
# get the required resources
43+
# shellcheck disable=SC2154
44+
for r in "${resources[@]}"; do
45+
get_resources "$r" "$NS"
4146
done
42-
done
47+
}
48+
49+
if [[ $CALLED -eq 1 ]]; then
50+
NS="$1"
51+
if [ -z "$NS" ]; then
52+
echo "No namespace passed, using the default one"
53+
NS=openstack
54+
fi
4355

44-
# get the required resources
45-
# shellcheck disable=SC2154
46-
for r in "${resources[@]}"; do
47-
get_resources "$r" "$NS"
48-
done
56+
gather_ctlplane_resources "$NS"
4957

50-
exit 0
58+
wait_bg
59+
fi

collection-scripts/gather_network

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,31 @@
11
#!/bin/bash
22

3-
DIR_NAME=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
4-
# shellcheck disable=SC1091
5-
source "${DIR_NAME}/common.sh"
3+
# When called from the shell directly
4+
if [[ -z "$DIR_NAME" ]]; then
5+
CALLED=1
6+
DIR_NAME=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
7+
source "${DIR_NAME}/common.sh"
8+
fi
69

710
# get nncp
811
mkdir -p "${BASE_COLLECTION_PATH}/network/nncp"
912
for iface in $(oc get nncp -o custom-columns=":metadata.name"); do
10-
/usr/bin/oc get nncp "$iface" -o yaml > "${BASE_COLLECTION_PATH}/network/nncp/$iface.log";
13+
run_bg /usr/bin/oc get nncp "$iface" -o yaml '>' "${BASE_COLLECTION_PATH}/network/nncp/$iface.log";
1114
done
1215

1316
# get nnce
1417
mkdir -p "${BASE_COLLECTION_PATH}/network/nnce"
1518
for iface in $(oc get nnce -o custom-columns=":metadata.name"); do
16-
/usr/bin/oc get nnce "$iface" -o yaml > "${BASE_COLLECTION_PATH}/network/nnce/$iface.log";
19+
run_bg /usr/bin/oc get nnce "$iface" -o yaml '>' "${BASE_COLLECTION_PATH}/network/nnce/$iface.log";
1720
done
1821

1922
# get ipaddresspools
2023
mkdir -p "${BASE_COLLECTION_PATH}/network/ipaddresspools"
2124
for ipadd in $(oc -n "${METALLB_NAMESPACE}" get ipaddresspools -o custom-columns=":metadata.name"); do
22-
/usr/bin/oc -n "${METALLB_NAMESPACE}" get ipaddresspools "$ipadd" -o yaml > "${BASE_COLLECTION_PATH}/network/ipaddresspools/$ipadd.log";
25+
run_bg /usr/bin/oc -n "${METALLB_NAMESPACE}" get ipaddresspools "$ipadd" -o yaml '>' "${BASE_COLLECTION_PATH}/network/ipaddresspools/$ipadd.log";
2326
done
2427

2528
# get l2advertisement
26-
/usr/bin/oc -n "${METALLB_NAMESPACE}" get l2advertisement -o yaml >> "${BASE_COLLECTION_PATH}/network/l2advertisement.log"
29+
run_bg /usr/bin/oc -n "${METALLB_NAMESPACE}" get l2advertisement -o yaml '>>' "${BASE_COLLECTION_PATH}/network/l2advertisement.log"
2730

28-
exit 0
31+
[[ $CALLED -eq 1 ]] && wait_bg

collection-scripts/gather_nodes

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,14 @@
11
#!/bin/bash
2+
#
3+
if [[ -z "$DIR_NAME" ]]; then
4+
CALLED=1
5+
DIR_NAME=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
6+
source "${DIR_NAME}/bg.sh"
7+
fi
28

39
mkdir -p "${NODES_COLLECTION_PATH}"
410
for node in $(/usr/bin/oc get nodes -o custom-columns=NAME:.metadata.name --no-headers); do
5-
/usr/bin/oc get nodes "${node}" -o yaml > "${NODES_COLLECTION_PATH}/${node}.yaml"
11+
run_bg /usr/bin/oc get nodes "${node}" -o yaml '>' "${NODES_COLLECTION_PATH}/${node}.yaml"
612
done
713

8-
exit 0
14+
[[ $CALLED -eq 1 ]] && wait_bg

0 commit comments

Comments
 (0)