Skip to content

Comments

Load NVIDIA Kernel Modules for JIT-CDI mode#975

Closed
elezar wants to merge 1 commit intoNVIDIA:mainfrom
elezar:load-kmods
Closed

Load NVIDIA Kernel Modules for JIT-CDI mode#975
elezar wants to merge 1 commit intoNVIDIA:mainfrom
elezar:load-kmods

Conversation

@elezar
Copy link
Member

@elezar elezar commented Mar 9, 2025

This change attempts to load the nvidia, nvidia-uvm, and nvidia-modeset kernel modules (NVIDIA kernel-mode GPU drivers) before generating the automatic (jit) CDI specification. This aligns the behaviour of the JIT-CDI mode with that of the nvidia-container-cli.

The kernel modules can be controlled by the

nvidia-container-runtime.modes.jit-cdi.load-kernel-modules

config option. If this is set to the empty list, then no kernel modules are loaded.

Errors in loading the kernel modules are logged, but ignored.

@elezar elezar added this to the Disable legacy code path by default milestone Mar 9, 2025
This change attempts to load the nvidia, nvidia-uvm, and nvidia-modeset
kernel modules before generating the automatic (jit) CDI specification.

The kernel modules can be controlled by the

nvidia-container-runtime.modes.jit-cdi.load-kernel-modules

config option. If this is set to the empty list, then no kernel modules
are loaded.

Errors in loading the kernel modules are logged, but ignored.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@jgehrcke
Copy link
Collaborator

This change attempts to load the nvidia, nvidia-uvm, and nvidia-modeset kernel modules before generating the automatic (jit) CDI specification.

What is this good for?

mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime.modes.jit-cdi]
load-kernel-modules = ["nvidia", "nvidia-uvm", "nvidia-modeset"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we end up making this fine selection? 🍷 🍇 🧀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@elezar elezar Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a typo. The module names should be nvidia, nvidia_uvm and nvidia_modeset.

Copy link
Collaborator

@jgehrcke jgehrcke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Evan.

@elezar
Copy link
Member Author

elezar commented Apr 2, 2025

This change attempts to load the nvidia, nvidia-uvm, and nvidia-modeset kernel modules before generating the automatic (jit) CDI specification.

What is this good for?

I have updated the description with more motivation.

}

// TODO: Consider moving this into the nvcdi API.
if err := driver.LoadKernelModules(cfg.NVIDIAContainerRuntimeConfig.Modes.JitCDI.LoadKernelModules...); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klueska are there any cases where we DON'T want to load / try to load the kernel modules? Note that we aslo skip this when running in a user namespace in libnvidia-container.

return []string{"/usr/local/share", "/usr/share"}
}

// LoadKmods loads the specified kernel modules in the driver root.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// LoadKmods loads the specified kernel modules in the driver root.
// LoadKernelModules loads the specified kernel modules in the driver root.

@elezar elezar modified the milestones: Disable legacy code path by default, v1.18.0 May 9, 2025
@elezar elezar marked this pull request as draft June 2, 2025 12:13
@elezar
Copy link
Member Author

elezar commented Jul 3, 2025

Closing this one. Will reopen if required.

@elezar elezar closed this Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants