Skip to content

Commit a285a1c

Browse files
author
Caspar van Leeuwen
committed
Account for the fact that nvidia-smi might be installed on a CPU node. The command will exist, but return a non-zero exit when run with .e.g --version because there are no GPU drivers
1 parent 8564e42 commit a285a1c

File tree

1 file changed

+16
-2
lines changed

1 file changed

+16
-2
lines changed

bot/build.sh

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -243,14 +243,28 @@ mkdir -p ${TARBALL_TMP_BUILD_STEP_DIR}
243243
# prepare arguments to eessi_container.sh specific to build step
244244
BUILD_STEP_ARGS+=("--save" "${TARBALL_TMP_BUILD_STEP_DIR}")
245245
BUILD_STEP_ARGS+=("--storage" "${STORAGE}")
246+
246247
# add options required to handle NVIDIA support
247248
if command_exists "nvidia-smi"; then
248-
echo "Command 'nvidia-smi' found, using available GPU"
249-
BUILD_STEP_ARGS+=("--nvidia" "all")
249+
# Accept that this may fail
250+
set +e
251+
nvidia-smi --version
252+
ec=$?
253+
set -e
254+
if [ ${ec} -eq 0 ]; then
255+
echo "Command 'nvidia-smi' found, using available GPU"
256+
BUILD_STEP_ARGS+=("--nvidia" "all")
257+
else
258+
echo "Warning: command 'nvidia-smi' found, but 'nvidia-smi --version' did not run succesfully."
259+
echo "This script now assumes this is NOT a GPU node."
260+
echo "If, and only if, the current node actually does contain Nvidia GPUs, this should be considered an error."
261+
BUILD_STEP_ARGS+=("--nvidia" "install")
262+
fi
250263
else
251264
echo "No 'nvidia-smi' found, no available GPU but allowing overriding this check"
252265
BUILD_STEP_ARGS+=("--nvidia" "install")
253266
fi
267+
254268
# Retain location for host injections so we don't reinstall CUDA
255269
# (Always need to run the driver installation as available driver may change)
256270
if [[ ! -z ${SHARED_FS_PATH} ]]; then

0 commit comments

Comments
 (0)