Skip to content

chore: nightly sync main into dev (02_07_2026)#5627

Draft
svcnvidia-nemo-ci wants to merge 74 commits into
devfrom
main2dev/02_07_2026
Draft

chore: nightly sync main into dev (02_07_2026)#5627
svcnvidia-nemo-ci wants to merge 74 commits into
devfrom
main2dev/02_07_2026

Conversation

@svcnvidia-nemo-ci

Copy link
Copy Markdown

Summary

Nightly sync of main into dev for 02_07_2026.

  • 73 commits from main merged into dev.
  • Python lines: +10208 / -1247 across 92 files (excludes golden-value JSON, uv.lock, etc.).
  • 61 files had merge conflicts, resolved surgically (see below).

Merge strategy

Started from origin/dev and ran git merge origin/main --no-edit, then resolved
each conflict preserving recent dev-only additions while incorporating main's
incoming changes. The dev-feature-preservation pre-push guard passes (no dev-only
code silently dropped).

Per the nightly-sync policy, the sync is dev-preferring for shared code that both
branches evolved: where main refactored/reorganized code dev also touched, dev's
version is kept to protect in-flight dev work; main's genuinely-new files, symbols and
features are brought in. This keeps the dev-feature-preservation guard green.

Files taken from main (override list)

These files reference dev-only args/APIs that were reconciled by taking main's version:

  • megatron/training/training.py — took main's version; renamed
    args.hybrid_context_parallelargs.dynamic_context_parallel to match the merged
    args namespace (dev's arg name; hybrid_context_parallel is a deprecated config alias).
  • megatron/training/initialize.py — main's version; already uses
    dynamic_context_parallel / min_dynamic_context_parallel_size, consistent with the
    merged parallel_state.initialize_model_parallel signature.

Dependency triple + CODEOWNERS

Kept dev's pyproject.toml, uv.lock, docker/Dockerfile.ci.dev, and .github/CODEOWNERS
verbatim (verified byte-identical to origin/dev). Git-source reconciliation: main adds a
mamba-ssm git source and bumps flash_mla/transformer-engine/emerging_optimizers;
dev pins mamba-ssm~=2.2 from PyPI and is ahead on emerging_optimizers (v0.3.0). No
merged main code was found to require a symbol only present at main's git revisions, so
dev's triple is kept unchanged.

API-mismatch fixes

  • megatron/training/checkpointing.py (kept dev's) — added dp_group, expt_dp_group,
    and rng_state_key_prefix compatibility parameters to save_checkpoint / load_checkpoint
    so main's training.py process-group-threaded call sites work against dev's checkpoint path.
  • .github/copy-pr-bot.yaml — merged the trustees list as a union (kept dev-only
    wplf plus main's newly-added trustees).
  • Restored dev's test_fused_mla_training_hooks_use_fused_down_projection unit test
    (verified it passes against the merged FusedMLASelfAttention.backward_dw
    _backward_output_projlinear_proj.backward_dw call chain).

Files deleted in dev and intentionally NOT restored

Verified each was deleted intentionally on dev (or removed by main) and nothing in the
merged tree imports them:

  • .github/workflows/multi-approval-bot.yml — dev removed the multi-approval action (ci: Remove multi-approval action from dev branch #3576).
  • tests/test_utils/recipes/h100/bert.yaml, t5.yaml — dev refactored functional tests.
  • tests/unit_tests/inference/engines/test_cg_admission_gating.py — dev deleted it;
    the merged engine does not expose the _cg_admission_gating_active / _cg_admission_check
    methods the test requires.

Remerge diff

git show --remerge-diff HEAD on the merge commit is large; a summary of the conflict
resolutions is captured above. The full conflict set spanned 61 files across training,
core transformer, MoE, DSA/experimental attention, RL, FSDP, inference, and their tests.

yashaswikarnati and others added 30 commits June 22, 2026 19:15
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
Signed-off-by: wdykas <wdykas@nvidia.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Laura Dang <laurad@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Signed-off-by: shanmugamr1992 <shanmugamr1992@gmail.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: shanmugamr1992 <shanmugamr1992@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Jingyue Wu <wujingyue@gmail.com>
Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
Signed-off-by: Jingyue Wu <wujingyue@gmail.com>
…and remove legacy modelbuilder functions (#5169)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: yanghao.666 <yanghao.666@jd.com>
Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
@svcnvidia-nemo-ci

Copy link
Copy Markdown
Author

/ok to test cd117b3

@svcnvidia-nemo-ci

Copy link
Copy Markdown
Author

/ok to test d086376

@svcnvidia-nemo-ci

Copy link
Copy Markdown
Author

/ok to test 5871bc8

Signed-off-by: svcnvidia-nemo-ci <svcnvidia-nemo-ci@users.noreply.github.com>

# Conflicts:
#	.github/copy-pr-bot.yaml
#	.github/scripts/oncall_manager.py
#	.github/scripts/sync_team_usergroups.py
#	docker/Dockerfile.ci.dev
#	docs/user-guide/features/fine_grained_activation_offloading.md
#	examples/bert/pretrain_bert.py
#	examples/inference/advanced/gpt_dynamic_inference.py
#	examples/post_training/modelopt/finetune.py
#	gpt_builders.py
#	megatron/core/datasets/gpt_dataset.py
#	megatron/core/distributed/fsdp/src/megatron_fsdp/megatron_fsdp.py
#	megatron/core/distributed/fsdp/src/megatron_fsdp/param_and_grad_buffer.py
#	megatron/core/inference/engines/dynamic_engine.py
#	megatron/core/models/gpt/experimental_attention_variant_module_specs.py
#	megatron/core/models/gpt/gpt_model.py
#	megatron/core/models/hybrid/hybrid_block.py
#	megatron/core/models/hybrid/hybrid_model.py
#	megatron/core/pipeline_parallel/fine_grained_activation_offload.py
#	megatron/core/pipeline_parallel/schedules.py
#	megatron/core/ssm/gated_delta_net.py
#	megatron/core/transformer/attention.py
#	megatron/core/transformer/cuda_graphs.py
#	megatron/core/transformer/experimental_attention_variant/dsa.py
#	megatron/core/transformer/experimental_attention_variant/dsa_kernels.py
#	megatron/core/transformer/module.py
#	megatron/core/transformer/moe/experts.py
#	megatron/core/transformer/multi_latent_attention.py
#	megatron/core/transformer/transformer_block.py
#	megatron/core/transformer/transformer_config.py
#	megatron/core/transformer/transformer_layer.py
#	megatron/core/utils.py
#	megatron/elastification/pretrain_hybrid_flex.py
#	megatron/inference/utils.py
#	megatron/rl/agent/api.py
#	megatron/rl/agent/reward_only_agent.py
#	megatron/rl/inference/megatron.py
#	megatron/rl/rl_utils.py
#	megatron/training/argument_utils.py
#	megatron/training/arguments.py
#	megatron/training/checkpointing.py
#	megatron/training/config/__init__.py
#	megatron/training/config/container.py
#	megatron/training/distillation/utils_logits.py
#	megatron/training/models/gpt.py
#	megatron/training/models/hybrid.py
#	megatron/training/training.py
#	megatron/training/utils/common_utils.py
#	megatron/training/yaml_arguments.py
#	pretrain_gpt.py
#	pretrain_hybrid.py
#	pyproject.toml
#	tests/unit_tests/inference/test_mtp_cuda_graph_inference.py
#	tests/unit_tests/pipeline_parallel/test_fine_grained_activation_offloading.py
#	tests/unit_tests/ssm/test_hybrid_block.py
#	tests/unit_tests/transformer/experimental_attention_variant/test_absorbed_mla.py
#	tests/unit_tests/transformer/experimental_attention_variant/test_attention_variant_dsa.py
#	tools/bert_embedding/embed.py
#	tools/run_inference_performance_test.py
#	tools/trigger_internal_ci.py
#	train_rl.py
#	uv.lock
@svcnvidia-nemo-ci

Copy link
Copy Markdown
Author

/ok to test e2dce1e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Run functional tests Run MBridge tests Attach this for testing this PR against MBridge main

Projects

None yet

Development

Successfully merging this pull request may close these issues.