-
Notifications
You must be signed in to change notification settings - Fork 440
Handle multiple GPUs in CDI spec generation from CSV #1461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Evan Lezar <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
This change updates the way we construct a discoverer for tegra systems to be more flexible in terms of how the SOURCES of the mount specs can be specified. This allows for subsequent changes like adding (or removing) mount specs at the point of construction. Signed-off-by: Evan Lezar <[email protected]>
This change allows CDI specs to be generated for multiple devices when using CSV mode. This can be used in cases where a Tegra-based system consists of an iGPU and dGPU. This behavior can be opted out of using the disable-multiple-csv-devices feature flag. This can be specified by adding the --feaure-flags=disable-multiple-csv-devices command line option to the nvidia-ctk cdi generate command or to the automatic CDI spec generation by adding NVIDIA_CTK_CDI_GENERATE_FEATURE_FLAGS=disable-multiple-csv-devices to the /etc/nvidia-container-toolkit/nvidia-cdi-refresh.env file. Signed-off-by: Evan Lezar <[email protected]>
2f9fcb8 to
57ef289
Compare
ArangoGutierrez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just 2 non-blocking nits
| func (o tegraOptions) newDiscovererFromCSVFiles() (discover.Discover, error) { | ||
| if len(o.csvFiles) == 0 { | ||
| o.logger.Warningf("No CSV files specified") | ||
| func (o options) newDiscovererFromMountSpecs() (discover.Discover, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of deleting the func comment, could we update it?
| if l.featureFlags[FeatureDisableMultipleCSVDevices] { | ||
| return l.purecsvDeviceSpecGenerators(ids...) | ||
| } | ||
| hasNVML, _ := l.infolib.HasNvml() | ||
| if !hasNVML { | ||
| return l.purecsvDeviceSpecGenerators(ids...) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this should be in the order that is documented above
// If NVML is not available or the disable-multiple-csv-devices feature flag is
// enabled, a single device is assumed.
This change allows CDI specs to be generated for multiple
devices when using CSV mode. This can be used in cases where
a Tegra-based system consists of an iGPU and dGPU.
This behavior can be opted out of using the
disable-multiple-csv-devicesfeature flag. This can be specified by adding the
command line option to the nvidia-ctk cdi generate command or to the
automatic CDI spec generation by adding
to the /etc/nvidia-container-toolkit/nvidia-cdi-refresh.env file.