Skip to content

Add support for gated modifications jit-cdi mode #1230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

elezar
Copy link
Member

@elezar elezar commented Aug 7, 2025

This change ensures that NVIDIA_GDS=enabled, NVIDIA_MOFED=enabled, and NVIDIA_GDRCOPY=enabled are honored by jit-cdi mode.

@coveralls
Copy link

coveralls commented Aug 7, 2025

Pull Request Test Coverage Report for Build 16831546401

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 88 (0.0%) changed or added relevant lines in 5 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.07%) to 35.663%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/nvcdi/lib.go 0 3 0.0%
internal/oci/spec.go 0 8 0.0%
pkg/nvcdi/mode.go 0 8 0.0%
pkg/nvcdi/gated.go 0 18 0.0%
internal/modifier/cdi.go 0 51 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/nvcdi/mode.go 1 7.84%
Totals Coverage Status
Change from base Build 16830975215: -0.07%
Covered Lines: 4583
Relevant Lines: 12851

💛 - Coveralls

@elezar elezar force-pushed the fix-gdrcopy-in-jit-cdi branch from 0af2c9e to 086955b Compare August 7, 2025 12:00
@elezar elezar added this to the v1.18.0 milestone Aug 7, 2025
@ArangoGutierrez ArangoGutierrez requested a review from Copilot August 7, 2025 12:44
Copilot

This comment was marked as outdated.

elezar added 5 commits August 8, 2025 15:23
This change ensures that NVIDIA_GDS=enabled, NVIDIA_MOFED=enabled,
NVIDIA_GDRCOPY=enabled, and NVIDIA_NVSWITCH=enabled are honored by
jit-cdi mode.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the fix-gdrcopy-in-jit-cdi branch from 086955b to 3fcc351 Compare August 8, 2025 13:23
@ArangoGutierrez ArangoGutierrez requested a review from Copilot August 8, 2025 13:54
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for gated modifications in jit-cdi mode by allowing NVIDIA_GDS=enabled, NVIDIA_MOFED=enabled, and NVIDIA_GDRCOPY=enabled environment variables to be honored. The changes refactor existing mode-specific libraries into a unified gated approach and extend automatic CDI device generation to support multiple modes.

  • Consolidates GDS, MOFED, and GDRCOPY modes into a unified gated library approach
  • Extends automatic CDI device generation to support multiple modes simultaneously
  • Adds support for NVSWITCH devices through the new gated system

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/nvcdi/mode.go Adds new mode constants (ModeGdrcopy, ModeNvswitch), reorders mode list, and fixes mode assignment in resolveMode
pkg/nvcdi/lib.go Consolidates gated modes (GDS, MOFED, GDRCOPY) to use unified gatedlib factory
pkg/nvcdi/gds.go Removes GDS-specific library implementation (deleted file)
pkg/nvcdi/gated.go Renames mofedlib to gatedlib and adds support for multiple gated modes via getModeDiscoverer
internal/oci/spec.go Adds SpecModifiers type to handle collections of spec modifiers
internal/modifier/cdi.go Extends automatic CDI generation to support gated devices and multiple modes per request

case ModeNvswitch:
return discover.NewNvSwitchDiscoverer(l.logger, l.devRoot)
default:
return nil, fmt.Errorf("unrecognized mode")
Copy link
Preview

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message "unrecognized mode" is not helpful as it doesn't specify which mode was unrecognized. Consider including the actual mode value in the error message.

Suggested change
return nil, fmt.Errorf("unrecognized mode")
return nil, fmt.Errorf("unrecognized mode %q", l.mode)

Copilot uses AI. Check for mistakes.

nvcdi.WithNVIDIACDIHookPath(cfg.NVIDIACTKConfig.Path),
nvcdi.WithDriverRoot(cfg.NVIDIAContainerCLIConfig.Root),
nvcdi.WithVendor(automaticDeviceVendor),
nvcdi.WithClass(perModeDeviceClass[mode]),
Copy link
Preview

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing perModeDeviceClass[mode] without checking if the key exists could cause a panic or unexpected behavior. The map only contains "auto" key, but modes can include "gds", "mofed", "gdrcopy", "nvswitch".

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants