-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix behavior for present but failing nvidiasmi #910
Fix behavior for present but failing nvidiasmi #910
Conversation
…. The command will exist, but return a non-zero exit when run with .e.g --version because there are no GPU drivers
…. The command will exist, but return a non-zero exit when run with .e.g --version because there are no GPU drivers
…. The command will exist, but return a non-zero exit when run with .e.g --version because there are no GPU drivers
Instance
|
Instance
|
Instance
|
Instance
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
PR merged! Moved |
PR merged! Moved |
PR merged! Moved |
PR merged! Moved |
Lots of copy-pasting, can we consider moving this check into a utility function so we don't have this duplicate code? |
We have some cpu nodes on which the nvidia-smi command is installed, but failing
I.e. simply checking for the presence of
nvidia-smi
and then concluding that GPUs are available (as we did prior to this PR) is not very robust. Sincenvidia-smi
does give a non-zero exit code in this case, it's pretty easy to improve the robustness of the check, which is done in this PR.