Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
544 commits
Select commit Hold shift + click to select a range
ef336ca
[Dev] Fix EP Overlap missing record stream for shared expert (#3244)
Wohox Feb 5, 2026
ec94d63
Restore missing linear-cross-entropy option accidentally removed from…
shjwudp Feb 6, 2026
500e080
Fix reload_model_params failure when loading MoE models with explicit…
eternally-z Feb 9, 2026
433c169
ci: Disable moe20 tests (#3312)
ko3n1g Feb 9, 2026
fd4801e
ci: Pin down setuptools to lt 82 (#3316)
ko3n1g Feb 9, 2026
52eabf0
[None][Fix] Prevent resource leak warnings (#3216)
IanBoyanZhang Feb 10, 2026
c0030d6
[Dev] Fix backward dw dependency (#3338)
Wohox Feb 10, 2026
2c2e749
ci: Rely exclusively on GitHub CI (#3341)
ko3n1g Feb 10, 2026
98f6f81
[dev] ci: skip queue in merge-gate (#3344)
ko3n1g Feb 10, 2026
28b130f
Revert "[None][Fix] Prevent resource leak warnings (#3216)" (#3366)
ko3n1g Feb 11, 2026
e868e8f
ci: Fix dev branch merge queue (#3397)
chtruong814 Feb 13, 2026
c4b910f
[Dev] Add Qwen3-VL support with Megatron-FSDP (#2842)
xuwchen Feb 13, 2026
6059f36
Add absorbed-mla (#3193)
kunlunl Feb 13, 2026
9f2ca96
cp: Remove gpu sanity check (#3420) into dev (#3421)
chtruong814 Feb 13, 2026
1dcf0da
[dev] ci: Fix merge queue (#3385)
ko3n1g Feb 14, 2026
cd1c215
[dev] `cp: Cherrypick CI changes` (#3543)
ko3n1g Feb 23, 2026
aa86018
[Dev] Fix MoE aux loss tracker hang with MTP enabled (#3400)
Victarry Feb 25, 2026
2b4b9c4
ci: Remove multi-approval action from dev branch (#3576)
chtruong814 Feb 25, 2026
0ab47fa
Merge branch 'main' into dev
FDecaYed Feb 26, 2026
a1a73f8
[dev] pull main 260220 (#3574)
ko3n1g Feb 26, 2026
2e4a5d4
[dev] fix(moe): fix the bug where gate was not sliced when kv_head < …
LiuXTao Feb 27, 2026
d0e0cf0
Add unit test for THD (#3608)
kunlunl Feb 28, 2026
bc9298c
[Dev] feat(checkpoint): zero-copy storage sharing in CheckpointWithou…
Victarry Mar 2, 2026
5c613ab
[Dev] Add E2E support for THD format (#2924)
xiaoyao0115 Mar 3, 2026
5dadaf1
fix: skip FSDP DTensor boundary validation under fake process group (…
Victarry Mar 4, 2026
2176c4a
ci: Remove cudagraph codeowners entry in dev branch (#3712)
chtruong814 Mar 5, 2026
31f5294
[dev] refactor to support emerging optimizers beyond muon (#3618)
FDecaYed Mar 5, 2026
a268231
[Dev] Move some processing into a function so can be compiled (#3220)
BestJuly Mar 5, 2026
f983b21
[Dev] Refactor MoE loss logging (#2569)
yanring Mar 5, 2026
0b0074e
[dev] feat(mHC): Add basic pytorch implementation of manifold hyper c…
jingqiny-99 Mar 6, 2026
597f0d8
[Dev] Cherry-pick: M-FSDP: Cancel erroneous grad accumulation check (…
Victarry Mar 6, 2026
3d097e5
[dev] fix(moe): Fix DSA spec and rope. (#3402)
yuzhongw-nvidia Mar 6, 2026
1edfbd6
Fix split_state_dict function for MoE models (#3667)
eternally-z Mar 10, 2026
28a0aef
Exposing interleave argument for fused_apply_rotary_pos_emb_thd (#3759)
huvunvidia Mar 10, 2026
15fb557
build: Move fast-hadmard-transform (#3786)
ko3n1g Mar 11, 2026
dbf6c4c
fix ddp bug when --overlap-grad-reduce and --num-distributed-optimi f…
wplf Mar 11, 2026
cde56a4
[Dev] Fix for rope when enabling THD + Dynamic-CP; and use the naming…
xiaoyao0115 Mar 11, 2026
9374a4d
Continue emerging optimizer refactoring (#3737)
skyw Mar 12, 2026
f47ad91
Fix emerging optimizer init_group for ckpt loading (#3897)
FDecaYed Mar 17, 2026
74124ba
fix cg acess issue by using dict instead of list to iteratively acces…
ilml Mar 17, 2026
51299c5
Enhance rotary positional embedding version checks (#3887)
huvunvidia Mar 17, 2026
7c3eea6
[DEV] fix(megatron-fsdp): build expt_device_mesh only for MoE models …
xuwchen Mar 17, 2026
a9e5bf9
[Fix][Dev] Missing Assertion for moe layer recomptue in A2A Overlap (…
Wohox Mar 18, 2026
ebf1508
ci: Fix sso users check (#3937)
chtruong814 Mar 19, 2026
8ae70d4
Add more emerging optimizers (#3907)
skyw Mar 19, 2026
c72c459
Support GEMM + Swiglu fused MLP (#3890)
ksivaman Mar 20, 2026
0296101
[Dev] Support EP Overlap's Dynamic Computation Stream For Full-Iter C…
Wohox Mar 25, 2026
4108d68
[dev] mHC kernel fusion (#3828)
jingqiny-99 Mar 25, 2026
79aeecf
Merge remote-tracking branch 'upstream/main' into tolong/sync-main-to…
ilml Mar 25, 2026
0e53b30
fix: correct H2->H4 header skips in router_replay.md
ilml Mar 25, 2026
076d20f
fix: add missing tensor_parallel import in absorbed_mla.py
ilml Mar 25, 2026
0961196
fix: correct import ordering for tensor_parallel in absorbed_mla
ilml Mar 25, 2026
6823637
fix layerwise related merge error due to dev refactor
FDecaYed Mar 30, 2026
0c306dc
[Dev][feat] Support CUDA Graph capture offloading modules (#3219)
lhb8125 Mar 30, 2026
9c0b6ef
update golden value for gpt3_moe_mcore_te_tp4_ep2_etp2_pp2_resume_tor…
FDecaYed Mar 30, 2026
f36257c
Merge branch 'dev' into pull-request/4031
FDecaYed Mar 30, 2026
4ef64eb
Sync main into dev (#4031)
ko3n1g Mar 30, 2026
2bb0d38
[Dev] Fix golden values mismatch and dependency error due to last pul…
Victarry Apr 3, 2026
8d1fd3c
[Dev] Skip routed expert padding for graph-safe MoE (#4071)
zhongbozhu Apr 3, 2026
74751c9
[DEV] Minor update optimizer (#4082)
skyw Apr 7, 2026
ab6c0ff
TE fused grouped mlp with grouped bias and delayed wgrad (#4095)
ksivaman Apr 7, 2026
37a4cee
[Dev][feat] Support overlapping A2A Combine backprop with wgrad GEMM …
Wohox Apr 7, 2026
b6193a3
[dev] feat(moe): Support packed sequence for gated delta net (GDN) (#…
yuzhongw-nvidia Apr 7, 2026
07b9fb2
Revert unintended code owner change (#4172)
kunlunl Apr 7, 2026
2d6e946
[Dev] feat: Dynamic CP (part 2) (#2000)
xiaoyao0115 Apr 7, 2026
16a985f
Update golden values nightly (#4185)
ko3n1g Apr 8, 2026
7fec528
[Dev][MoE] Add a new score function to the router (#4193)
yaox12 Apr 8, 2026
0b121cf
[Dev] feat(fsdp): use TE general_gemm for mixed-precision wgrad in FS…
Victarry Apr 8, 2026
06ecac3
ci: update dev nightly golden values (#4201)
ko3n1g Apr 8, 2026
3beeaa6
ci: Update gb200 golden value test after merge from main to dev (#4216)
chtruong814 Apr 9, 2026
5c54484
Enabled fused grouped MLP for `quick_gelu` and add config for grouped…
ksivaman Apr 9, 2026
0410104
[Dev] Update golden values for inference (#4246)
yaox12 Apr 10, 2026
7226640
[dev] fix: Support mHC with cuda graph and activation offloading (#4190)
jingqiny-99 Apr 10, 2026
7960a31
Add training code to MCore wheel (#3573) (#4255)
maanug-nv Apr 10, 2026
0f6fcb0
[dev] fix(ssm): handle alignment padding in GDN packed seq + CP (#4230)
yxs Apr 13, 2026
c0c4fdc
[Dev] Paged Stashing (#2690)
nanz-nv Apr 13, 2026
1fc9200
[Dev] Fix TE version check for retain_pinned_cpu_buffers in cpu offlo…
BestJuly Apr 13, 2026
7ff046b
[Dev] Add diagnostic warnings to TEGroupedMLP fused impl checks (#4269)
Victarry Apr 14, 2026
66e55f9
Merge remote-tracking branch 'origin/main' into main2dev/14_04_2026
svcnvidia-nemo-ci Apr 14, 2026
11d78a1
chore: nightly sync main into dev (14_04_2026)
svcnvidia-nemo-ci Apr 14, 2026
1d7f3a9
fix: take main's pyproject.toml and uv.lock for lock-file consistency
svcnvidia-nemo-ci Apr 14, 2026
74ec0f9
fix: re-run black formatting on 4 files missed in initial commit
svcnvidia-nemo-ci Apr 14, 2026
913ca3d
fix: add missing docstrings to MegatronGradScaler abstract methods
svcnvidia-nemo-ci Apr 14, 2026
861c840
fix: revert to dev's pyproject.toml, uv.lock, and Dockerfile.ci.dev
svcnvidia-nemo-ci Apr 14, 2026
4012662
fix: use main's pyproject.toml, uv.lock, and Dockerfile.ci.dev
svcnvidia-nemo-ci Apr 14, 2026
e763da8
fix: restore RMSNorm import order in legacy model __init__
svcnvidia-nemo-ci Apr 14, 2026
9f69681
fix: remove stale sequence_packing parametrize and use dev's TE revision
svcnvidia-nemo-ci Apr 14, 2026
2be925c
fix: restore missing CudaGraphScope import, take dev's gated_delta_ne…
Phlip79 Apr 14, 2026
c203444
fix: add fast-hadamard-transform dependency from dev for DSA test
Phlip79 Apr 14, 2026
e7e7a3e
fix: remove fast-hadamard-transform from no-build-isolation-package t…
Phlip79 Apr 14, 2026
6b8f089
fix: splice fast-hadamard-transform into pyproject.toml and uv.lock
Phlip79 Apr 14, 2026
bd24b5c
fix: disambiguate packaging version in uv.lock for fast-hadamard-tran…
Phlip79 Apr 14, 2026
98b2076
fix: take dev's uv.lock — main's lockfile is missing dev-only depende…
Phlip79 Apr 14, 2026
0ba88b5
fix: take dev's pyproject.toml and uv.lock together — they must be co…
Phlip79 Apr 14, 2026
7c27108
revert: restore pyproject.toml, uv.lock, Dockerfile.ci.dev to last kn…
Phlip79 Apr 14, 2026
22e70ca
fix: add fast-hadamard-transform and regenerate uv.lock in CUDA conta…
Phlip79 Apr 14, 2026
09a76f8
fix: take dev's Dockerfile.ci.dev — it must match dev's pyproject.tom…
Phlip79 Apr 14, 2026
e9f1020
fix: take dev's pyproject.toml, uv.lock, and Dockerfile.ci.dev
Phlip79 Apr 14, 2026
9e33263
fix: add nvidia-resiliency-ext git source from main to fix ImportError
Phlip79 Apr 14, 2026
9a7c5dd
fix mfsdp unwrap stuck at MegatronFSDP [dev] (#4273)
wplf Apr 15, 2026
76371d4
Fix UT timeout (#4310)
kunlunl Apr 15, 2026
817b2c4
fix: restore dev's GroupedQuantizedTensor handling in distrib_optimizer
Phlip79 Apr 15, 2026
e56a6c0
fix: remove double-remove in fine_grained_activation_offload bulk_off…
Phlip79 Apr 15, 2026
2a68a9c
fix: resolve fine-grained offloading API mismatches from merge
Phlip79 Apr 15, 2026
01b70a1
[dev] fix(ssm): handle alignment padding in GDN packed seq + CP (#4230)
yxs Apr 13, 2026
a2e673f
chore: nightly sync main into dev (14_04_2026) (#4291)
ko3n1g Apr 16, 2026
6f0795c
ci(action): improve GitHub Actions output UX (#4336)
ko3n1g Apr 16, 2026
c2c7f0f
fix(dev): correct params->parameters typo in ChainedOptimizer.step() …
Phlip79 Apr 17, 2026
f145c98
[Dev] Revert code owner changes from pull main (#4354)
yaox12 Apr 17, 2026
ac6ca5b
build: bump TransformerEngine to release_v2.14 (dev) (#4332)
ko3n1g Apr 17, 2026
f2a40ef
[Dev] Add permute/unpermute fusion with dispatch/combine in Hybrid-EP…
Autumn1998 Apr 17, 2026
bd698d1
Fix fused grouped MLP wgrad hooks for DDP reduce-scatter (#4311)
gdengk Apr 17, 2026
73cc2ce
Fix activation_func check and MLP sharded_state_dict (#4325)
gdengk Apr 17, 2026
6efb083
Allow fine-grained offloading with MC impl of full-CG. (#4253)
rapatel Apr 17, 2026
be3b874
Add TEFusedDenseMLP for Dense+Grouped GEMM fusion on SM100+ (#4318)
sraman-rgb Apr 20, 2026
1b47bc0
[Dev] Fix docs build from main sync (#4356)
Victarry Apr 20, 2026
55df4e5
[Dev] overload factor logging (#4110)
nanz-nv Apr 20, 2026
13557a2
Revert "fix mfsdp unwrap stuck at MegatronFSDP [dev] (#4273)" (#4393)
wplf Apr 21, 2026
546a448
M4 leftover for TE cuda graph (dev) (#4369)
Phlip79 Apr 21, 2026
85bced0
[Dev] Add high-priority a2a comm stream option and hybridep preproces…
gdengk Apr 22, 2026
57005c8
Merge remote-tracking branch 'origin/main' into main2dev/22_04_2026
github-actions[bot] Apr 22, 2026
8add4e4
chore: post-merge fixes for nightly sync main into dev (22_04_2026)
github-actions[bot] Apr 22, 2026
baa3df4
fix: revert nvidia-resiliency-ext revision to match uv.lock
github-actions[bot] Apr 22, 2026
bbb06e2
fix: reformat 4 files with correct black==24.4.2 and isort==5.13.2
github-actions[bot] Apr 22, 2026
89798f3
fix: restore missing ArgumentGroupFactory import in arguments.py
svcnvidia-nemo-ci Apr 22, 2026
db5ade5
Reorder mtp_post_process after attn backward in 1F1B schedule plan (#…
gdengk Apr 23, 2026
8cf5458
chore: keep CODEOWNERS unchanged in main→dev sync
Phlip79 Apr 23, 2026
fba3a80
chore: update gpt3_mcore_te_tp2_pp2_mhc golden values for main→dev sync
Phlip79 Apr 23, 2026
78858b2
[Dev] Fix mis-set decoupled gradient for Megatron-FSDP. (#4426)
cspades Apr 25, 2026
64d2e0a
chore: nightly sync main into dev (22_04_2026) (#4436)
ko3n1g Apr 28, 2026
8821e6f
fix(ci): re-enable scoped_cudagraph MoE test with TE 2.14 golden valu…
buptzyb Apr 29, 2026
66a2ff8
[Dev] remove dead manual_release_grads code path in 1F1B overlap sche…
Wohox Apr 29, 2026
fe729e9
[dev] [DeepSeek-v4] Part 2: Hash MoE and SwiGLU clamp (#4481)
hxbai Apr 30, 2026
bf4e1db
[dev] [DeepSeek-v4] Part 1: Hybrid Attention with CSA and HCA (#4458)
hxbai Apr 30, 2026
a2d7153
feat(attention): Add rotary_base_per_layer for Step-3.5-Flash (#4473)
shifangx May 3, 2026
7b570e6
test(transformer): add DSv4 hybrid hash MoE integration coverage (#4596)
Glitchfix May 6, 2026
01434b0
Fix checkpoint loading with `load_main_params_from_ckpt=True` for gro…
ksivaman May 7, 2026
994f5c9
Add a knob to throttle the max allowed inflight offload in fine grain…
nanz-nv May 7, 2026
4401ac8
[MXFP8/FP4-param-gather] Post processing after forced param AG in eva…
WanZzzzzz May 7, 2026
829a7b7
Allow optimizer CG to share the same pool as full-iter CG (#4521)
nanz-nv May 8, 2026
bfad45c
Merge remote-tracking branch 'origin/main' into main2dev/10_05_2026
github-actions[bot] May 10, 2026
9453c6e
fix: post-CI corrections
github-actions[bot] May 10, 2026
bfce39d
fix: restore dev-only changes lost by main-priority merge
Phlip79 May 11, 2026
fb297f0
fix
FDecaYed May 11, 2026
c3dbea7
fix: rename args.hybrid_context_parallel to args.dynamic_context_para…
Phlip79 May 11, 2026
e04e1c2
fix: port wrap_data_iterator pattern from PR #4659 to fix DCP test
Phlip79 May 11, 2026
d338cc5
chore: nightly sync main into dev (10_05_2026) (#4716)
balasaajay May 11, 2026
ab5ab5d
[Dev] fix(mtp): use padded cu_seqlens in MTP roll for THD with CP (#4…
BestJuly May 12, 2026
c55150a
cp: build: widen flashinfer-python pin to <0.7.0 (#4700) into `dev` (…
chtruong814 May 12, 2026
5b376fd
[Dev] fix: restore PR #3219 fine-grained offload semantics after PR #…
lhb8125 May 13, 2026
0989486
[Dev] Fix MTP recompute crash with packed sequences (#4592)
BestJuly May 13, 2026
341e9f7
[Dev][Cherry-Pick Main] Remove legacy grouped gemm and grouped mlp (#…
yaox12 May 14, 2026
4c2022d
[Dev] Refactor CUDA graph API: decompose cuda_graph_scope into full_i…
buptzyb May 14, 2026
cf081d5
[fix] Release MTP assertion when EP overlap with PP=1 (#4797)
Wohox May 14, 2026
2e55168
[dev] [DeepSeek-v4] Part 3: MTP support with mHC and new mHC contract…
hxbai May 14, 2026
df12802
[dev] Fix GDN DTensor splitting for FSDP checkpointing (#4799)
conver334 May 15, 2026
cfbd9df
[dev] [4/5] Qwen3.5 support: Interleaved MRoPE layout (#4750)
wplf May 15, 2026
2672ff5
[DEV] fix(megatron-fsdp): preserve non-meta tensors during meta devic…
xuwchen May 15, 2026
8195337
[dev] [3/5] Qwen3.5 support: SharedExpertMLP meta init (#4749)
wplf May 15, 2026
77c0f8c
[Dev][feat] Support A2A Overlap for Megatron-FSDP (#3796)
Wohox May 15, 2026
6139c51
[Dev][opt] Optimize e_proj and h_proj TP communication for MTP with m…
Baibaifan May 18, 2026
bac68ec
[Dev][fix] FSDP EP-overlap CUDA-graph guard uses post-refactor API (#…
buptzyb May 18, 2026
fad6c1e
[Dev] Fix full CUDA graph capture reverted by pull main (#4792)
Victarry May 19, 2026
92ab682
[dev] [2/5] Qwen3.5 support: FSDP DTensor Bridge checkpoint compatibi…
wplf May 19, 2026
b8ffa5e
[MXFP8] Mirror fixes in Mbridge for mxfp8 param gather (#4818)
zhongbozhu May 19, 2026
e3ef5d1
Add opt-in nonuniform tensor parallelism (#4585)
daiyaanarfeen May 19, 2026
bb525f1
Paged stashing updae (#4778)
nanz-nv May 19, 2026
ee3f1ff
Fix mxfp8 param gather numerical issue when DP overlap is off (#4769)
WanZzzzzz May 19, 2026
bd5c98f
[dev] feat(attention): add use_head_wise_attn_gate for Step-3.5-Flash…
shifangx May 19, 2026
0afbb98
[Dev] add support for deepep/hybridep dispatcher under thd format tra…
HaochenYuan May 21, 2026
98b595c
[dev] Fix FSDP TP metadata for LinearCrossEntropyModule (#4888)
conver334 May 21, 2026
6b6fb95
[Dev] Add MoE example recipes (#4890)
Victarry May 21, 2026
56481b0
[Dev] Remove duplicated `bias_act_func` from main to dev sync (#4927)
yaox12 May 22, 2026
f553f2f
[dev] [DeepSeek-v4] Part 4: Fusion Kernels for DSv4 Hybrid Attention …
hxbai May 27, 2026
2ee3bfb
[dev] fix no_shard training convergency and add unittest for no_shard…
wplf May 27, 2026
2d10b8a
Add mHC support for HybridModel on dev (#4949)
Connor-XY May 27, 2026
7b9593b
[Dev] Add Qwen3 30B MoE recipes (#5012)
Victarry May 28, 2026
473145c
[DEV] fix(megatron-fsdp): reduce padding for grouped expert weights (…
xuwchen May 28, 2026
3e8ce1f
[Dev] fix(combined-1f1b): release loss-node input storage after combi…
Wohox May 28, 2026
35f36c7
[dev] [fix] [DeepSeek-v4] fix dense loss and rope type in DSv4 Hybrid…
hxbai May 28, 2026
58f3e67
[dev] [5/5] Qwen3.5 support: Qwen3.5-VL training example (#4751)
wplf May 29, 2026
1fe7825
chore: Update Docker image version to 26.04-py3 on dev (#5051)
ko3n1g May 29, 2026
630956b
[Dev] Skip identity alltoall chunk sort (#5102)
lhb8125 Jun 2, 2026
60ef5e2
fix: correct dsv4_hybrid Q-up FLOPs by using args.v_head_dim (#5142)
dingqingy-nv Jun 4, 2026
05a93e0
[dev] [follow-up] Qwen3.5 support: MoE aux loss padding_mask (#4776)
wplf Jun 4, 2026
d23ca85
[Dev] Support isolated MTP loss (#5080)
Victarry Jun 4, 2026
cefc252
[dev] moe(fix): Avoid TE cuda graph dummy attention masks (#5131)
yuzhongw-nvidia Jun 5, 2026
ae1fc65
chore: nightly sync main into dev (06_06_2026)
github-actions[bot] Jun 6, 2026
06f659b
fix: post-CI corrections (docs dup fields, nvrx version gate, utils r…
github-actions[bot] Jun 6, 2026
ca1253c
[dev] [DeepSeek-v4] Add ClampedSwiGLU to MoE mlp_op_fuser and add for…
hxbai Jun 7, 2026
12fc125
Merge current dev into nightly sync
Phlip79 Jun 8, 2026
959a542
Minor improvements for Dynamic-cp (#4226)
xiaoyao0115 Jun 8, 2026
fb1e6d7
fix: post-CI corrections
Phlip79 Jun 8, 2026
0af5708
Merge remote-tracking branch 'origin/dev' into main2dev/06_06_2026
Phlip79 Jun 8, 2026
39f8d55
varlendataset for thd e2e and benchmark (#4832)
xiaoyao0115 Jun 8, 2026
179df43
fix: wrap mtp logging comment
Phlip79 Jun 8, 2026
5f37f45
[Dev] Add separate toggle for varlen input padding for HybridEP in TH…
zhongbozhu Jun 9, 2026
5788a20
Merge remote-tracking branch 'origin/dev' into main2dev/06_06_2026
Phlip79 Jun 8, 2026
2f10049
chore: nightly sync main into dev (06_06_2026) (#5199)
balasaajay Jun 9, 2026
89ad096
[Dev] DeepSeek-V4-Flash recipe 20260610 (#5266)
hxbai Jun 11, 2026
694cb0d
[Dev] Generalized fix for mxfp8 param gather (#4994)
zhongbozhu Jun 11, 2026
7eedac5
[Dev] Cherry-pick MTP detach heads (#5223)
Victarry Jun 11, 2026
0e0814a
[TE] Restore original CP group after dynamic CP forward in TEDotProdu…
rui23 Jun 11, 2026
192cbc3
[examples] Add dynamic context parallel benchmark example (#5123)
ilml Jun 11, 2026
08518ad
[dev] Add experimental Megatron Lite as agentic exploration (#4885)
ISEEKYAN Jun 11, 2026
ae955ca
Merge remote-tracking branch 'origin/main' into main2dev/12_06_2026
svcnvidia-nemo-ci Jun 12, 2026
c700379
ci: Allow DCO check in merge queue and add DCO requirement (#5305)
balasaajay Jun 12, 2026
b0d49c9
fix: restore dev-only training.py features dropped by the main override
svcnvidia-nemo-ci Jun 12, 2026
9d46c92
[dev]: faster implementation of mHC fused kernels (#4624)
jingqiny-99 Jun 12, 2026
e5ad098
[Dev] Add MoE recipe performance summary (#5289)
Victarry Jun 15, 2026
c049020
[Dev] Add DeepEP v2 flex dispatcher backend (#4793)
Autumn1998 Jun 15, 2026
15a4ffc
fix tflops calculation when sequence_packing_scheduler is not none (#…
xiaoyao0115 Jun 15, 2026
3d8b95c
[dev] bump emerging optimizers to v0.3.0 (#5320)
FDecaYed Jun 15, 2026
0302fac
Merge branch 'dev' into main2dev/12_06_2026
Phlip79 Jun 15, 2026
95654c9
fix: preserve seqlen stats in train_step
Phlip79 Jun 15, 2026
377b2e0
Enable Deepseek-v4 hybrid_model in dev branch Part (1/N) (#5042)
guihong-nv Jun 16, 2026
b6bea7e
chore: nightly sync main into dev (12_06_2026) (#5314)
balasaajay Jun 16, 2026
8dc6e66
feat(ssm): whole-module 'gdn' selective recompute for GatedDeltaNet (…
wplf Jun 16, 2026
232c478
[Fix] Fix optimizer parameter override bugs. (#5213)
Baibaifan Jun 16, 2026
2047dec
[Dev] Add Megatron-FSDP weight prefetch for full recompute (#5175)
lhb8125 Jun 16, 2026
7f91752
[Dev] add cuda graph support for thd format training. (#4359)
HaochenYuan Jun 16, 2026
9af7c79
[Dev] restore DSv4 tflops calc in training and fix the packed seq cas…
hxbai Jun 16, 2026
3346fa8
[dev] moe(perf): Restore fused GDN THD all-to-all on dev (#5389)
yuzhongw-nvidia Jun 17, 2026
c2a9a60
Merge remote-tracking branch 'origin/main' into main2dev/22_06_2026
github-actions[bot] Jun 22, 2026
3fae48f
[Fix] Fix MoE router z-loss compatibility with TE CUDA Graph capture.…
Baibaifan Jun 23, 2026
94f70b6
test: mark ep_a2a_overlap activation-offloading test flaky_in_dev (#5…
ko3n1g Jun 23, 2026
8c0aa6c
test: mark TestParallelTransformerBlockCudagraphs::test_gpu_cudagraph…
ko3n1g Jun 24, 2026
6c6ce4e
test: mark gated_delta_net selective-recompute test flaky_in_dev (#5476)
ko3n1g Jun 24, 2026
cc083e1
fix: post-CI corrections for sync test/impl reconciliation
github-actions[bot] Jun 22, 2026
15486f7
fix sequence packing wrapper for eval (#5483)
xiaoyao0115 Jun 25, 2026
4463f78
[dev] Megatron Lite (4/4) shared attention (#5427)
ISEEKYAN Jun 25, 2026
f6c77e7
fix: restore fused group MLP offload in main2dev sync (#5493)
lhb8125 Jun 25, 2026
72cef25
chore: nightly sync main into dev (22_06_2026) (#5430)
chtruong814 Jun 26, 2026
056d9c0
[dev] [DeepSeek-v4] Packed Sequence (THD) support for DSv4 Hybrid Att…
hxbai Jun 26, 2026
d2e7ec5
Improve default dynamic CP packing scheduler (#5154)
xiaoyao0115 Jun 26, 2026
d963266
[dev] moe(perf): Pre-GDR kernel fusion (#5361)
yuzhongw-nvidia Jun 27, 2026
04bbfe6
Preserve DSA output across fused inverse RoPE (#5526)
kunlunl Jun 29, 2026
3e89f3c
Enable Deepseek-v4 hybrid_model in dev branch Part (2/N) (#5485)
guihong-nv Jun 29, 2026
3b9c248
[dev] Add experimental decoupled compact LayerWise DDP layout for Muo…
Wohox Jun 30, 2026
5e846ab
[dev] Sync Megatron Lite with the latest implementation (#5577)
ISEEKYAN Jul 1, 2026
66dc96f
[Dev] fix padding mask docstring (#5598)
HaochenYuan Jul 2, 2026
6031238
Add GDN context parallel communication modes
yuzhongw-nvidia Jun 9, 2026
d749b6c
Make GDN conv padding configurable
yuzhongw-nvidia Jun 16, 2026
d8b0afa
Refactor and rename GDN CP layout handling
yuzhongw-nvidia Jun 29, 2026
df90299
Address GDN CP review guards
yuzhongw-nvidia Jul 1, 2026
098d7d9
Fix THD contiguous CP rank partition
yuzhongw-nvidia Jul 1, 2026
3d39e26
Require flash-linear-attention CP API
yuzhongw-nvidia Jul 1, 2026
a9fad1c
Document THD CP routing future work
yuzhongw-nvidia Jul 1, 2026
50a0a89
Update Mamba MoE config golden
yuzhongw-nvidia Jul 2, 2026
0beec47
fix(thd): use padded boundaries for TE attention
yuzhongw-nvidia Jul 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
77 changes: 5 additions & 72 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,80 +1,13 @@
megatron/core/ @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/models/common/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/gpt

megatron/core/models/gpt/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/gpt

megatron/core/models/multimodal/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/multi-modal

megatron/core/models/mamba/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-model
megatron/core/ssm/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-model

megatron/core/models/hybrid/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-model

megatron/core/datasets/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/datasets

megatron/core/tokenizers/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/tokenizers

megatron/core/distributed/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/data-parallelism
megatron/core/distributed/fsdp/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/megatron-fsdp

megatron/core/transformer/fsdp_dtensor_checkpoint.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/megatron-fsdp

megatron/core/dist_checkpointing/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-checkpointing

megatron/core/optimizer/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mcore-optimizer

megatron/core/optimizer/distrib_optimizer.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-optimizer
megatron/core/optimizer/layer_wise_optimizer.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-optimizer
megatron/core/optimizer/param_layout.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-optimizer

megatron/core/optimizer/emerging_optimizers.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mcore-emerging-optimizers
megatron/core/optimizer/muon.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mcore-emerging-optimizers
megatron/core/optimizer/qk_clip.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mcore-emerging-optimizers @NVIDIA/transformer

megatron/core/inference/modelopt_support @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/post-training

megatron/core/datasets/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/datasets

megatron/core/pipeline_parallel/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/pipeline-parallelism

megatron/core/transformer/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/transformer

megatron/core/transformer/moe/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mixture-of-experts-adlr @NVIDIA/mixture-of-experts-devtech

megatron/core/inference/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/inference

megatron/inference/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/inference-interface

megatron/core/parallel_state.py @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/post_training/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/post-training

megatron/post_training/ @NVIDIA/post-training

megatron/core/transformer/cuda_graphs.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/cuda-graphs

megatron/training/ @NVIDIA/training-adlr @NVIDIA/training-nemo
megatron/training/arguments.py
* @NVIDIA/core-nemo @NVIDIA/core-devtech

.gitlab/ @NVIDIA/ci
.github/ @NVIDIA/ci
.github/oncall_schedule.json @NVIDIA/mcore-oncall-rotation
.gitlab-ci.yml @NVIDIA/ci
docker/ @NVIDIA/ci
tests/unit_tests/run_ci_test.sh @NVIDIA/ci
tests/test_utils/python_scripts/
tests/functional_tests/python_test_utils/ @NVIDIA/ci
tests/functional_tests/shell_test_utils/ @NVIDIA/ci
tests/test_utils/recipes/ @NVIDIA/ci
tests/unit_tests/run_ci_test.sh @NVIDIA/ci

# API Backwards Compatibility Check
scripts/check_api_backwards_compatibility.py @NVIDIA/ci
scripts/README_API_COMPAT.md @NVIDIA/ci
.github/workflows/check_api_backwards_compatibility_workflow.yml @NVIDIA/ci
docs/api-backwards-compatibility-check.md @NVIDIA/ci
tests/unit_tests/test_api_backwards_compat_setup.py @NVIDIA/ci

megatron/rl/ @NVIDIA/reinforcement-learning
examples/rl/ @NVIDIA/reinforcement-learning
test/unit_tests/test_rl_utils.py @NVIDIA/reinforcement-learning
train_rl.py @NVIDIA/reinforcement-learning
pyproject.toml @NVIDIA/ci
uv.lock @NVIDIA/ci
2 changes: 1 addition & 1 deletion .github/copy-pr-bot.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
enabled: true
auto_sync_draft: false
auto_sync_ready: true
trustees_override: ["AAnoosheh", "ArEsKay3", "Autumn1998", "BestJuly", "BoxiangW", "CarlosGomes98", "ChenhanYu", "Connor-XY", "FDecaYed", "HaochenYuan", "HollowMan6", "ISEEKYAN", "JRD971000", "Mellonta", "Phlip79", "QiZhangNV", "RPrenger", "ShriyaRishab", "Victarry", "WanZzzzzz", "Wohox", "YangFei1990", "ZhiyuLi-Nvidia", "adistomar", "ahmadki", "aklife97", "alokpathy", "ananthsub", "anlthms", "aroshanghias-nvd", "ashehper", "asolergi-nv", "athitten", "balasaajay", "buptzyb", "chtruong814", "cjld", "cspades", "cuichenx", "deepakn94", "dimapihtar", "dingqingy-nv", "duncanriach", "erhoo82", "ericharper", "fanshiqing", "faradawn", "fitsumreda", "frsun-nvda", "gautham-kollu", "gdengk", "guihong-nv", "guyueh1", "hexinw-nvidia", "huvunvidia", "hxbai", "ilml", "jalbericiola", "janEbert", "jaredcasper", "jenchen13", "jiemingz", "jingqiny-99", "jkamalu", "jon-barker", "jstjohn", "kajalj22", "kamran-nvidia", "kevalmorabia97", "ko3n1g", "ksivaman", "kunlunl", "kvareddy", "kwyss-nvidia", "lauradang", "layalir", "lhb8125", "liding-nv", "lmcafee-nvidia", "maanug-nv", "macandro96", "mathemakitten", "matthieule", "mchrzanowski", "mehraakash", "minitu", "mkhona-nvidia", "nanz-nv", "ntajbakhsh", "parthmannan", "philipcmonk", "prajwal1210", "pthombre", "rapatel", "rhewett-nv", "rogerwaleffe", "sajadn", "sanandaraj5597", "sancha", "santhnm2", "sbak5", "shanmugamr1992", "sharathts", "sheliang-nv", "shengf-nv", "shifangx", "shjwudp", "sidsingh-nvidia", "skyw", "sraman-rgb", "sudhakarsingh27", "tdene", "theothermike", "thomasdhc", "tomlifu", "trintamaki", "tylerpoon", "wdykas", "wplf", "wujingyue", "xiaoyao0115", "xuantengh", "xuwchen", "yaox12", "yaoyu-33", "yashaswikarnati", "yeyu-nvidia", "yobibyte", "youngeunkwon0405", "yueshen2016", "yuzhongw-nvidia", "zhongbozhu"]
trustees_override: ["AAnoosheh", "ArEsKay3", "Autumn1998", "BestJuly", "BoxiangW", "CarlosGomes98", "ChenhanYu", "Connor-XY", "FDecaYed", "HaochenYuan", "ISEEKYAN", "JRD971000", "Mellonta", "Phlip79", "QiZhangNV", "RPrenger", "ShriyaRishab", "Victarry", "WanZzzzzz", "Wohox", "YangFei1990", "ZhiyuLi-Nvidia", "adistomar", "ahmadki", "aklife97", "alokpathy", "ananthsub", "anlthms", "aroshanghias-nvd", "ashehper", "asolergi-nv", "athitten", "balasaajay", "buptzyb", "chtruong814", "cjld", "cspades", "cuichenx", "deepakn94", "dimapihtar", "dingqingy-nv", "duncanriach", "erhoo82", "ericharper", "fanshiqing", "faradawn", "fitsumreda", "frsun-nvda", "gautham-kollu", "gdengk", "guihong-nv", "guyueh1", "hexinw-nvidia", "huvunvidia", "hxbai", "ilml", "jalbericiola", "janEbert", "jaredcasper", "jenchen13", "jiemingz", "jingqiny-99", "jkamalu", "jon-barker", "jstjohn", "kajalj22", "kamran-nvidia", "kevalmorabia97", "ko3n1g", "ksivaman", "kunlunl", "kvareddy", "kwyss-nvidia", "layalir", "lhb8125", "liding-nv", "lmcafee-nvidia", "maanug-nv", "macandro96", "mathemakitten", "matthieule", "mchrzanowski", "mehraakash", "minitu", "mkhona-nvidia", "nanz-nv", "ntajbakhsh", "parthmannan", "philipcmonk", "prajwal1210", "pthombre", "rapatel", "rhewett-nv", "rogerwaleffe", "sajadn", "sanandaraj5597", "sancha", "santhnm2", "sbak5", "shanmugamr1992", "sharathts", "sheliang-nv", "shengf-nv", "shifangx", "shjwudp", "sidsingh-nvidia", "skyw", "sraman-rgb", "sudhakarsingh27", "tdene", "theothermike", "thomasdhc", "tomlifu", "trintamaki", "tylerpoon", "wdykas", "wplf", "wujingyue", "xiaoyao0115", "xuantengh", "xuwchen", "yaox12", "yaoyu-33", "yashaswikarnati", "yeyu-nvidia", "yobibyte", "youngeunkwon0405", "yueshen2016", "yuzhongw-nvidia", "zhongbozhu"]
Loading
Loading