Skip to content

[main] moe(feat): support chunkwise context parallelism for GDN#5637

Draft
yuzhongw-nvidia wants to merge 544 commits into
NVIDIA:mainfrom
yuzhongw-nvidia:yuzhongw/fla_cp
Draft

[main] moe(feat): support chunkwise context parallelism for GDN#5637
yuzhongw-nvidia wants to merge 544 commits into
NVIDIA:mainfrom
yuzhongw-nvidia:yuzhongw/fla_cp

Conversation

@yuzhongw-nvidia

Copy link
Copy Markdown
Contributor
  • I, the PR author, have personally reviewed every line of this PR.

What does this PR do?

pr for dev: #3282

depends on: #5392

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact @NVIDIA/mcore-oncall.

Issue tracking

For PRs from open-source community contributors:

  • New features: a linked issue is required. Please open a feature request and reference it here before submitting the PR.
  • Small updates (bug fixes, minor improvements): a linked issue is recommended and will accelerate the PR review process.

Linked issue:

Contribution process

Pre-checks

  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

Feel free to message or comment @NVIDIA/mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.

Step 1: Mark PR as "Ready for Review"

  1. When your PR is ready, click Ready for Review.
  2. An oncall reviewer is auto-assigned and expert reviewers are notified based on your changes.
    • Some PRs may jump straight to step 2. This is determined by .github/CODEOWNERS.

⚠️ Only mark as ready once merge-conflicts are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

Step 2: Final Review

For PRs that change megatron/core, once all expert reviewers have approved, the Final Review label is applied automatically and final reviewers are assigned.

For PRs outside megatron/core, this step is skipped.

Step 3: Approved

Once all required reviewers have approved, the Approved label is applied automatically.

Merge

Any member of mcore-engineers will be able to merge your PR.

Wohox and others added 30 commits February 5, 2026 14:40
… arguments.py (NVIDIA#3266)

Co-authored-by: Xin Yao <xiny@nvidia.com>
… state_dict (NVIDIA#3243)

Co-authored-by: Xin Yao <xiny@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>
Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com>
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: Robin Zhang <robinz@nvidia.com>
Signed-off-by: jinliangl <jinliangl@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: xiaoxi-wangfj <690912414@qq.com>
Signed-off-by: skydoorkai <htsantaclara@163.com>
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
Signed-off-by: meg miranda <mmiranda@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: sajadn <snorouzi@nvidia.com>
Signed-off-by: lit <lit@nvidia.com>
Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Signed-off-by: Cory Ye <cye@nvidia.com>
Signed-off-by: adithyare <adithyare@nvidia.com>
Signed-off-by: Soumye Singhal <soumyes@cw-dfw-cs-001-dc-01.cm.cluster>
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Signed-off-by: mikail <mkhona@nvidia.com>
Co-authored-by: HaochenYuan <106647990+HaochenYuan@users.noreply.github.com>
Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Duncan Riach <33532941+duncanriach@users.noreply.github.com>
Co-authored-by: yobi byte <yobibyte@users.noreply.github.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: wdykas <73254672+wdykas@users.noreply.github.com>
Co-authored-by: root <root@gpu-h100-0348.cm.cluster>
Co-authored-by: root <root@gpu-h100-0193.cm.cluster>
Co-authored-by: root <root@gpu-h100-0082.cm.cluster>
Co-authored-by: root <root@gpu-h100-0495.cm.cluster>
Co-authored-by: William Dykas <wdykas@cw-pdx-cs-001-vscode-02.cm.cluster>
Co-authored-by: root <root@gpu-h100-0213.cm.cluster>
Co-authored-by: root <root@gpu-h100-0435.cm.cluster>
Co-authored-by: root <root@gpu-h100-0188.cm.cluster>
Co-authored-by: root <root@gpu-h100-0032.cm.cluster>
Co-authored-by: root <root@gpu-h100-0023.cm.cluster>
Co-authored-by: root <root@gpu-h100-0368.cm.cluster>
Co-authored-by: root <root@gpu-h100-0203.cm.cluster>
Co-authored-by: root <root@gpu-h100-0229.cm.cluster>
Co-authored-by: root <root@gpu-h100-0123.cm.cluster>
Co-authored-by: root <root@gpu-h100-0217.cm.cluster>
Co-authored-by: root <root@gpu-h100-0496.cm.cluster>
Co-authored-by: root <root@gpu-h100-0261.cm.cluster>
Co-authored-by: GitHub Actions <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Yuzhong Wang <yuzhongw@nvidia.com>
Co-authored-by: Hongbin Liu <lhb8125@users.noreply.github.com>
Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com>
Co-authored-by: Keshav Santhanam <ksanthanam@nvidia.com>
Co-authored-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>
Co-authored-by: tgkyrie <74066353+tgkyrie@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: rkarimimahab <rkarimimahab@nvidia.com>
Co-authored-by: Rabeeh Mahabadi <rkarimimahab@nb-hel-cs-001-vscode-02.cm.cluster>
Co-authored-by: Sanjeev Satheesh <sasatheesh@nvidia.com>
Co-authored-by: Deepak Narayanan <dnarayanan@nvidia.com>
Co-authored-by: Santosh Bhavani <santosh.bhavani@live.com>
Co-authored-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Co-authored-by: Li Tao <lit@nvidia.com>
Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com>
Co-authored-by: mvirts <mvirts@gmail.com>
Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Co-authored-by: ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 <hollowman@opensuse.org>
Co-authored-by: Robin Zhang <robinz@nvidia.com>
Co-authored-by: Sheng Fu <shengf@nvidia.com>
Co-authored-by: Venmugil Elango <498703+venmugil@users.noreply.github.com>
Co-authored-by: mathemakitten <helenn@nvidia.com>
Co-authored-by: Jared Casper <155158+jaredcasper@users.noreply.github.com>
Co-authored-by: Parth Mannan <38387286+parthmannan@users.noreply.github.com>
Co-authored-by: Teodor-Dumitru Ene <34819528+tdene@users.noreply.github.com>
Co-authored-by: Tong Liu <tongliu@nvidia.com>
Co-authored-by: Li Jinliang <jinliangl@nvidia.com>
Co-authored-by: Jinliang Li <jinliangl@pool0-01676.cm.cluster>
Co-authored-by: Jinliang Li <jinliangl@cw-dfw-cs-001-vscode-01.cm.cluster>
Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com>
Co-authored-by: Nick Schank <nick@reflection.ai>
Co-authored-by: Jeffrey Chen <jeffrey@reflection.ai>
Co-authored-by: janEbert <janpabloe@nvidia.com>
Co-authored-by: rj42 <lbkzman@gmail.com>
Co-authored-by: Juntao Wang <juntaow@nvidia.com>
Co-authored-by: Pingtian Li <158665726+Wohox@users.noreply.github.com>
Co-authored-by: Chris Grimm <chris@reflection.ai>
Co-authored-by: Chenhan D. Yu <5185878+ChenhanYu@users.noreply.github.com>
Co-authored-by: Eric Harper <eharper@nvidia.com>
Co-authored-by: xiaoxi-wangfj <690912414@qq.com>
Co-authored-by: Jianbin Chang <shjwudp@gmail.com>
Co-authored-by: c1lovez1 <141424951+c1lovez1@users.noreply.github.com>
Co-authored-by: Zhang Haitao <htsantaclara@163.com>
Co-authored-by: yeyu-nvidia <yeyu@nvidia.com>
Co-authored-by: kwyss-nvidia <kwyss@nvidia.com>
Co-authored-by: Jon Barker <jbarker@nvidia.com>
Co-authored-by: Asha Anoosheh <aanoosheh@nvidia.com>
Co-authored-by: Siddharth Singh <136645615+sidsingh-nvidia@users.noreply.github.com>
Co-authored-by: megnvidia <mmiranda@nvidia.com>
Co-authored-by: thecaptain789 <257642323+thecaptain789@users.noreply.github.com>
Co-authored-by: thecaptain789 <thecaptain789@users.noreply.github.com>
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: Yan Bai <baiyan1996@icloud.com>
Co-authored-by: xuwchen <xuwenc@nvidia.com>
Co-authored-by: John St. John <jstjohn@users.noreply.github.com>
Co-authored-by: Lawrence McAfee <85179052+lmcafee-nvidia@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Robert Kirby <ArEsKay3@users.noreply.github.com>
Co-authored-by: Siddharth Singh <sidsingh@nvidia.com>
Co-authored-by: Robert Kirby <rkirby@cw-dfw-cs-001-vscode-01.cm.cluster>
Co-authored-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Co-authored-by: Dennis(Zhenhuan) Liu <denliu@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: vasunvidia <108759426+vasunvidia@users.noreply.github.com>
Co-authored-by: Philip Petrakian <pgpetrak@gmail.com>
Co-authored-by: Sajad Norouzi <sajad.n@gmail.com>
Co-authored-by: Kunlun Li <94586211+kunlunl@users.noreply.github.com>
Co-authored-by: xielaixin <xielx@shanghaitech.edu.cn>
Co-authored-by: Robert Kirby <rkirby@nvidia.com>
Co-authored-by: Ming <93323717+dndnda@users.noreply.github.com>
Co-authored-by: liming127 <liming127@meituan.com>
Co-authored-by: Jon Barker <jbarker@oci-hsg-cs-001-vscode-01.cm.cluster>
Co-authored-by: helen ngo <helen.ngo14@gmail.com>
Co-authored-by: Jenny Chen <jennifchen@nvidia.com>
Co-authored-by: yueshen2016 <39203804+yueshen2016@users.noreply.github.com>
Co-authored-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Co-authored-by: Cory Ye <44509866+cspades@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: Soumye Singhal <soumyes@cw-dfw-cs-001-dc-01.cm.cluster>
Co-authored-by: Seonjin Na <sna@nvidia.com>
Co-authored-by: Seonmyeong Bak <sbak@nvidia.com>
Co-authored-by: Mikail Khona (NVIDIA) <mkhona@nvidia.com>
…tp_size. (NVIDIA#3529)

Co-authored-by: xiaotaoliu <xiaotaoliu@tencent.com>
Co-authored-by: Yuzhong Wang <yuzhongw@nvidia.com>
Co-authored-by: Zijie Yan <zijiey@nvidia.com>
…tOutput (NVIDIA#3641)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: xiaoyao0115 <1804647152@qq.com>
Signed-off-by: tailaim <tailaim@nvidia.com>
Co-authored-by: kunlunl <kunlunl@nvidia.com>
…VIDIA#3668)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Co-authored-by: Hao Wu <skyw@nvidia.com>
Co-authored-by: Robin Zhang <robinz@nvidia.com>
…onnection(mHC). (NVIDIA#2943)

Co-authored-by: Jingqin Yang <jingqiny@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: root <root@eos0478.eos.clusters.nvidia.com>
Co-authored-by: Dennis Liu <denliu@nvidia.com>
hxbai and others added 29 commits June 16, 2026 23:51
Signed-off-by: Yuzhong Wang <yuzhongw@nvidia.com>
Nightly sync of main into dev (22_06_2026). Resolves 16 conflicts
preserving dev features (pre-push guard: 0 dropped dev lines); brings
in main's inference shard-spec API additively. Supersedes NVIDIA#5429.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
…NVIDIA#5401)

Signed-off-by: yangfan.bai <yangfan.bai@shopee.com>
Co-authored-by: yangfan.bai <yangfan.bai@shopee.com>
…IDIA#5450)

Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… flaky_in_dev (NVIDIA#5475)

Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…DIA#5476)

Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1) test_optimizer.py: reverted to dev. The sync kept dev's
   multi_latent_attention.py (split q/kv down-proj, no
   _synthesize_fused_qkv_down_weight), but auto-merged main's
   test asserting the fused linear_qkv_down_proj.weight key.

2) training.py: guard the dev-only sequence_packing_scheduler config
   access with getattr (lines in train_step and train()). main's new
   MIMO schedule-plumbing test (NVIDIA#5333) passes an empty SimpleNamespace
   config; the reconciled training.py keeps dev's packing path, so the
   access must tolerate a config lacking the attribute. Real configs are
   unaffected (getattr returns the same value).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Signed-off-by: xiaoyao0115 <1804647152@qq.com>
Signed-off-by: Yan Bai <bayan@nvidia.com>
Signed-off-by: hongbinl <hongbinl@nvidia.com>
Signed-off-by: svcnvidia-nemo-ci <svc-nvidia-nemo-ci@nvidia.com>
Signed-off-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: janEbert <janpabloe@nvidia.com>
Signed-off-by: Philip Petrakian <ppetrakian@nvidia.com>
Signed-off-by: Helen Ngo <helenn@nvidia.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Signed-off-by: Shijie Wang <jaywan@nvidia.com>
Signed-off-by: Ajay Balasa <abalasa@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Signed-off-by: ilml <tolong@nvidia.com>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Signed-off-by: sraman <sraman@nvidia.com>
Signed-off-by: Jingyue Wu <wujingyue@gmail.com>
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: hongbinl <hongbinl@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Lawrence McAfee <lmcafee@nvidia.com>
Signed-off-by: wdykas <wdykas@nvidia.com>
Signed-off-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Signed-off-by: svcnvidia-nemo-ci <svc-nvidia-nemo-ci@nvidia.com>
Co-authored-by: Teodor-Dumitru Ene <34819528+tdene@users.noreply.github.com>
Co-authored-by: Asha Anoosheh <aanoosheh@nvidia.com>
Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com>
Co-authored-by: Pranav Thombre <pthombre@nvidia.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Co-authored-by: janEbert <janpabloe@nvidia.com>
Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>
Co-authored-by: mathemakitten <helenn@nvidia.com>
Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Shijie <505749828@qq.com>
Co-authored-by: Ajay <abalasa@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Co-authored-by: Deepak Narayanan <dnarayanan@nvidia.com>
Co-authored-by: Tom Long <tolong@nvidia.com>
Co-authored-by: Keshav Santhanam <ksanthanam@nvidia.com>
Co-authored-by: Teodor-Dumitru Ene <teodord.ene@gmail.com>
Co-authored-by: Siddhartha Raman Sundara Raman <sraman@nvidia.com>
Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
Co-authored-by: ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 <hollowman@opensuse.org>
Co-authored-by: Hongbin Liu <lhb8125@users.noreply.github.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: Lawrence McAfee <85179052+lmcafee-nvidia@users.noreply.github.com>
Co-authored-by: wdykas <73254672+wdykas@users.noreply.github.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
…ention (NVIDIA#5011)

Signed-off-by: Hongxiao Bai <hongxiaob@nvidia.com>
Signed-off-by: tailaim <tailaim@nvidia.com>
Signed-off-by: Yuzhong Wang <yuzhongw@nvidia.com>
Signed-off-by: kunlunl <kunlunl@nvidia.com>
Co-authored-by: Kaixiang Lei <5780122+shyoshyo@users.noreply.github.com>
Signed-off-by: guihong-nv <guihongl@nvidia.com>
NVIDIA#5388)

Signed-off-by: pingtianl <pingtianl@nvidia.com>
Signed-off-by: Pingtian Li <pingtianl@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: HaochenYuan <haocheny@nvidia.com>
Signed-off-by: Yuzhong Wang <yuzhongw@nvidia.com>
Signed-off-by: Yuzhong Wang <yuzhongw@nvidia.com>
Signed-off-by: Yuzhong Wang <yuzhongw@nvidia.com>
Signed-off-by: Yuzhong Wang <yuzhongw@nvidia.com>
Signed-off-by: Yuzhong Wang <yuzhongw@nvidia.com>
Signed-off-by: Yuzhong Wang <yuzhongw@nvidia.com>
Signed-off-by: Yuzhong Wang <yuzhongw@nvidia.com>
Signed-off-by: Yuzhong Wang <yuzhongw@nvidia.com>
Signed-off-by: Yuzhong Wang <yuzhongw@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jul 3, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.