Skip to content

chore: AUT-673 Update Docker image version to 26.06-py3#5622

Draft
svcnemo-autobot wants to merge 6 commits into
NVIDIA:mainfrom
svcnemo-autobot:ci/implement-f806ad04a816
Draft

chore: AUT-673 Update Docker image version to 26.06-py3#5622
svcnemo-autobot wants to merge 6 commits into
NVIDIA:mainfrom
svcnemo-autobot:ci/implement-f806ad04a816

Conversation

@svcnemo-autobot

Copy link
Copy Markdown
Collaborator

Bumped the dev NGC PyTorch base image from 26.04 to 26.06 in both CI pin sites — docker/.ngc_version.dev and the two IMAGE_TYPE:dev BASE_IMAGE rows (amd64+arm64) in .gitlab/stages/01.build.yml; left the LTS pin (25.09) untouched per the bump-base-image skill. Assumed dev-only scope since the request didn't mention LTS; note golden-value drift may require a follow-up refresh once functional CI runs.

Bump the dev NGC PyTorch base image to 26.06 in both CI pin sites
(GitHub docker/.ngc_version.dev and the GitLab dev BASE_IMAGE rows).
@svcnemo-autobot svcnemo-autobot requested a review from a team as a code owner July 2, 2026 10:55
@copy-pr-bot

copy-pr-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@svcnvidia-nemo-ci svcnvidia-nemo-ci marked this pull request as draft July 2, 2026 10:55
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

  1. Add the oncall reviewer (optional reviewer)
  2. Add required review teams based on your changes

See the contribution guide for more details.

@svcnemo-autobot

Copy link
Copy Markdown
Collaborator Author

/ok to test e12c03f

Pin the transformer-engine git source to the v2.15 release tag per
review feedback on the 26.06 base-image bump.
Match the vendored uv dependency-metadata version to the v2.15 pin.
@svcnemo-autobot

Copy link
Copy Markdown
Collaborator Author

/ok to test b4994eb

NGC PyTorch 26.06 ships torch 2.13.0a0, whose DTensor sharding
propagation no longer supports the in-place fused `aten._foreach_lerp_`
(torch.optim.Adam moment update) on Replicate-placed DTensors, raising
"in-place operations that require placement changes are not supported".
This is a torch-side regression from the base-image bump, not a
Megatron-FSDP bug; skip the affected combinatorial test until upstream
fixes it. Tracking issue to be filed by maintainers.
Skip the combinatorial test_fully_shard on torch 2.13+ where the DTensor
in-place _foreach_lerp_ regression breaks the Adam optimizer step.
@svcnemo-autobot

Copy link
Copy Markdown
Collaborator Author

/ok to test 28b82ef

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants