chore: nightly sync main into dev (02_07_2026) by svcnvidia-nemo-ci · Pull Request #5627 · NVIDIA/Megatron-LM

svcnvidia-nemo-ci · 2026-07-02T18:50:16Z

Summary

Nightly sync of main into dev for 02_07_2026.

73 commits from main merged into dev.
Python lines: +10208 / -1247 across 92 files (excludes golden-value JSON, uv.lock, etc.).
61 files had merge conflicts, resolved surgically (see below).

Merge strategy

Started from origin/dev and ran git merge origin/main --no-edit, then resolved
each conflict preserving recent dev-only additions while incorporating main's
incoming changes. The dev-feature-preservation pre-push guard passes (no dev-only
code silently dropped).

Per the nightly-sync policy, the sync is dev-preferring for shared code that both
branches evolved: where main refactored/reorganized code dev also touched, dev's
version is kept to protect in-flight dev work; main's genuinely-new files, symbols and
features are brought in. This keeps the dev-feature-preservation guard green.

Files taken from `main` (override list)

These files reference dev-only args/APIs that were reconciled by taking main's version:

megatron/training/training.py — took main's version; renamed
args.hybrid_context_parallel → args.dynamic_context_parallel to match the merged
args namespace (dev's arg name; hybrid_context_parallel is a deprecated config alias).
megatron/training/initialize.py — main's version; already uses
dynamic_context_parallel / min_dynamic_context_parallel_size, consistent with the
merged parallel_state.initialize_model_parallel signature.

Dependency triple + CODEOWNERS

Kept dev's pyproject.toml, uv.lock, docker/Dockerfile.ci.dev, and .github/CODEOWNERS
verbatim (verified byte-identical to origin/dev). Git-source reconciliation: main adds a
mamba-ssm git source and bumps flash_mla/transformer-engine/emerging_optimizers;
dev pins mamba-ssm~=2.2 from PyPI and is ahead on emerging_optimizers (v0.3.0). No
merged main code was found to require a symbol only present at main's git revisions, so
dev's triple is kept unchanged.

API-mismatch fixes

megatron/training/checkpointing.py (kept dev's) — added dp_group, expt_dp_group,
and rng_state_key_prefix compatibility parameters to save_checkpoint / load_checkpoint
so main's training.py process-group-threaded call sites work against dev's checkpoint path.
.github/copy-pr-bot.yaml — merged the trustees list as a union (kept dev-only
wplf plus main's newly-added trustees).
Restored dev's test_fused_mla_training_hooks_use_fused_down_projection unit test
(verified it passes against the merged FusedMLASelfAttention.backward_dw →
_backward_output_proj → linear_proj.backward_dw call chain).

Files deleted in dev and intentionally NOT restored

Verified each was deleted intentionally on dev (or removed by main) and nothing in the
merged tree imports them:

.github/workflows/multi-approval-bot.yml — dev removed the multi-approval action (ci: Remove multi-approval action from dev branch #3576).
tests/test_utils/recipes/h100/bert.yaml, t5.yaml — dev refactored functional tests.
tests/unit_tests/inference/engines/test_cg_admission_gating.py — dev deleted it;
the merged engine does not expose the _cg_admission_gating_active / _cg_admission_check
methods the test requires.

Remerge diff

git show --remerge-diff HEAD on the merge commit is large; a summary of the conflict
resolutions is captured above. The full conflict set spanned 61 files across training,
core transformer, MoE, DSA/experimental attention, RL, FSDP, inference, and their tests.

Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>

Signed-off-by: wdykas <wdykas@nvidia.com>

Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: Laura Dang <laurad@nvidia.com>

Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>

Signed-off-by: shanmugamr1992 <shanmugamr1992@gmail.com> Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: shanmugamr1992 <shanmugamr1992@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Signed-off-by: Jingyue Wu <wujingyue@gmail.com>

…5326)

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>

Signed-off-by: Jingyue Wu <wujingyue@gmail.com>

…and remove legacy modelbuilder functions (#5169) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: Hollow Man <hollowman@opensuse.org>

Signed-off-by: oliver könig <okoenig@nvidia.com>

Signed-off-by: yanghao.666 <yanghao.666@jd.com>

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>

…ls) (#5469) Signed-off-by: ilml <tolong@nvidia.com>

Signed-off-by: ykarnati <ykarnati@nvidia.com>

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>

svcnvidia-nemo-ci · 2026-07-02T21:12:14Z

/ok to test cd117b3

svcnvidia-nemo-ci · 2026-07-02T22:03:22Z

/ok to test d086376

svcnvidia-nemo-ci · 2026-07-02T22:56:59Z

/ok to test 5871bc8

Signed-off-by: svcnvidia-nemo-ci <svcnvidia-nemo-ci@users.noreply.github.com> # Conflicts: # .github/copy-pr-bot.yaml # .github/scripts/oncall_manager.py # .github/scripts/sync_team_usergroups.py # docker/Dockerfile.ci.dev # docs/user-guide/features/fine_grained_activation_offloading.md # examples/bert/pretrain_bert.py # examples/inference/advanced/gpt_dynamic_inference.py # examples/post_training/modelopt/finetune.py # gpt_builders.py # megatron/core/datasets/gpt_dataset.py # megatron/core/distributed/fsdp/src/megatron_fsdp/megatron_fsdp.py # megatron/core/distributed/fsdp/src/megatron_fsdp/param_and_grad_buffer.py # megatron/core/inference/engines/dynamic_engine.py # megatron/core/models/gpt/experimental_attention_variant_module_specs.py # megatron/core/models/gpt/gpt_model.py # megatron/core/models/hybrid/hybrid_block.py # megatron/core/models/hybrid/hybrid_model.py # megatron/core/pipeline_parallel/fine_grained_activation_offload.py # megatron/core/pipeline_parallel/schedules.py # megatron/core/ssm/gated_delta_net.py # megatron/core/transformer/attention.py # megatron/core/transformer/cuda_graphs.py # megatron/core/transformer/experimental_attention_variant/dsa.py # megatron/core/transformer/experimental_attention_variant/dsa_kernels.py # megatron/core/transformer/module.py # megatron/core/transformer/moe/experts.py # megatron/core/transformer/multi_latent_attention.py # megatron/core/transformer/transformer_block.py # megatron/core/transformer/transformer_config.py # megatron/core/transformer/transformer_layer.py # megatron/core/utils.py # megatron/elastification/pretrain_hybrid_flex.py # megatron/inference/utils.py # megatron/rl/agent/api.py # megatron/rl/agent/reward_only_agent.py # megatron/rl/inference/megatron.py # megatron/rl/rl_utils.py # megatron/training/argument_utils.py # megatron/training/arguments.py # megatron/training/checkpointing.py # megatron/training/config/__init__.py # megatron/training/config/container.py # megatron/training/distillation/utils_logits.py # megatron/training/models/gpt.py # megatron/training/models/hybrid.py # megatron/training/training.py # megatron/training/utils/common_utils.py # megatron/training/yaml_arguments.py # pretrain_gpt.py # pretrain_hybrid.py # pyproject.toml # tests/unit_tests/inference/test_mtp_cuda_graph_inference.py # tests/unit_tests/pipeline_parallel/test_fine_grained_activation_offloading.py # tests/unit_tests/ssm/test_hybrid_block.py # tests/unit_tests/transformer/experimental_attention_variant/test_absorbed_mla.py # tests/unit_tests/transformer/experimental_attention_variant/test_attention_variant_dsa.py # tools/bert_embedding/embed.py # tools/run_inference_performance_test.py # tools/trigger_internal_ci.py # train_rl.py # uv.lock

svcnvidia-nemo-ci · 2026-07-02T23:42:05Z

/ok to test e2dce1e

yashaswikarnati and others added 30 commits June 22, 2026 19:15

Support the MIMO cross-grid path in training loop (#5373)

e1c4495

Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

Stabilize hybrid_2b GB200 perf test against run-to-run noise (#5364)

6bd392f

Consistent oncall schedule (#5404)

b6b44a7

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>

Disag MR3: Add heterogeneous KV/Mamba reshard planners (#5188)

93a7642

Signed-off-by: wdykas <wdykas@nvidia.com>

Add RADIO vision encoder wrapper for MIMO example (#5397)

2a46893

Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

Clean up MTP inference control flow (#5418)

76f6ccc

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

Add MIMO dual gradient finalization (colocated + non-colocated) (#5286)

f8170b4

Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add RL rollout submission and consumption granularity controls (#5306)

a58373f

Signed-off-by: Laura Dang <laurad@nvidia.com>

Add --functional-test-name to trigger_internal_ci (#5449)

f66c28f

Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Rename CP batch helpers to describe balancing granularity (#5403)

8fa1831

Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

build: point flash_mla at the nv_dev branch (#5448)

06ae6a9

Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add logprobs_mode (raw/processed) to inference config (#5419)

a2bb5e5

Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>

Add hetero grid args and MoE process groups for MIMO example (#5375)

b1884d1

Signed-off-by: ykarnati <ykarnati@nvidia.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

ci: Set test_save_verify_integrity_manifest_directly as flaky (#5468)

b549290

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

Remove DBuffer mesh axis validation (#5441)

47cb413

Signed-off-by: Jingyue Wu <wujingyue@gmail.com>

feat(inference): default use_coordinator to True in high-level APIs (#…

a27b040

…5326)

Support HybridModel feature specs in ModelOpt (#5354)

811bd29

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>

Add experimental Megatron-FSDP fully_shard implementation (#5387)

e7af860

Signed-off-by: Jingyue Wu <wujingyue@gmail.com>

chore: rotate oncall schedule

cc0c960

Add inference functions to support MCore-/MBridge- training refactor …

4d44e37

…and remove legacy modelbuilder functions (#5169) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ci: launch GB200 unit tests via launch_on_gb200 marker (#5477)

5a256f3

Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

build: install flash_mla from source in the CI image (#5481)

82de1b8

Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

[split 2/4] Scale DSA indexer loss in pipeline schedules (#5244)

0b0d985

Signed-off-by: Hollow Man <hollowman@opensuse.org>

ci: check megatron.training imports in installation test (#5458)

0938eb7

Signed-off-by: oliver könig <okoenig@nvidia.com>

Fix merges_file kwarg name in HuggingFaceTokenizer (#5406)

239959b

Signed-off-by: yanghao.666 <yanghao.666@jd.com>

Automated community request assignment (#5147)

3330d12

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>

Clean up training.py module header (dedupe + reorganize imports/globa…

311416f

…ls) (#5469) Signed-off-by: ilml <tolong@nvidia.com>

Thread process groups through training checkpoint paths (#5486)

9038381

Signed-off-by: ykarnati <ykarnati@nvidia.com>

Narrow oncall responsibilities (#5490)

5863721

Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>

copy-pr-bot Bot temporarily deployed to public July 2, 2026 20:56 Inactive

copy-pr-bot Bot temporarily deployed to public July 2, 2026 21:06 Inactive

svcnvidia-nemo-ci force-pushed the main2dev/02_07_2026 branch from d37b35e to cd117b3 Compare July 2, 2026 21:12

copy-pr-bot Bot temporarily deployed to public July 2, 2026 21:12 Inactive

copy-pr-bot Bot temporarily deployed to test July 2, 2026 21:13 Inactive

copy-pr-bot Bot temporarily deployed to public July 2, 2026 21:16 Inactive

copy-pr-bot Bot temporarily deployed to public July 2, 2026 21:25 Inactive

svcnvidia-nemo-ci force-pushed the main2dev/02_07_2026 branch from cd117b3 to d086376 Compare July 2, 2026 22:03

copy-pr-bot Bot temporarily deployed to public July 2, 2026 22:04 Inactive

copy-pr-bot Bot temporarily deployed to test July 2, 2026 22:04 Inactive

copy-pr-bot Bot temporarily deployed to public July 2, 2026 22:07 Inactive

copy-pr-bot Bot temporarily deployed to public July 2, 2026 22:08 Inactive

copy-pr-bot Bot temporarily deployed to public July 2, 2026 22:17 Inactive

svcnvidia-nemo-ci force-pushed the main2dev/02_07_2026 branch from d086376 to 9efd37c Compare July 2, 2026 22:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: nightly sync main into dev (02_07_2026)#5627

chore: nightly sync main into dev (02_07_2026)#5627
svcnvidia-nemo-ci wants to merge 74 commits into
devfrom
main2dev/02_07_2026

svcnvidia-nemo-ci commented Jul 2, 2026

Uh oh!

svcnvidia-nemo-ci commented Jul 2, 2026

Uh oh!

svcnvidia-nemo-ci commented Jul 2, 2026

Uh oh!

svcnvidia-nemo-ci commented Jul 2, 2026

Uh oh!

svcnvidia-nemo-ci commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Uh oh!

Conversation

svcnvidia-nemo-ci commented Jul 2, 2026

Summary

Merge strategy

Files taken from main (override list)

Dependency triple + CODEOWNERS

API-mismatch fixes

Files deleted in dev and intentionally NOT restored

Remerge diff

Uh oh!

svcnvidia-nemo-ci commented Jul 2, 2026

Uh oh!

svcnvidia-nemo-ci commented Jul 2, 2026

Uh oh!

svcnvidia-nemo-ci commented Jul 2, 2026

Uh oh!

svcnvidia-nemo-ci commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Files taken from `main` (override list)