Releases · NVIDIA-NeMo/Export-Deploy

Release list

NVIDIA NeMo-Export-Deploy 0.6.0 Latest

Latest

nemo-automation-bot released this 23 Jun 01:40

v0.6.0

400d5d8

Changelog Details

Version bump to 0.6.0rc0.dev0 by @github-actions[bot] :: PR: #642
chore: bump _code_freeze workflow to v0.86.0 by @ko3n1g :: PR: #643
build: Bump vLLM to address CVE by @ko3n1g :: PR: #644
chore(beep boop 🤖): bump FW-CI-templates workflow pins to v0.88.0 by @svcnvidia-nemo-ci :: PR: #646
Fix MLA model issues by @oyilmaz-nvidia :: PR: #647
build: drop rc0 pre-release tag and add dynamic git versioning by @ko3n1g :: PR: #648
build: Set trt-llm and vllm for 26.04 by @chtruong814 :: PR: #650
Fix VLM no image inference issue and add tests by @meatybobby :: PR: #634
docs: bump versions1.json to 0.5.0 (latest) by @ko3n1g :: PR: #655
docs: add SECURITY.md by @chtruong814 :: PR: #659
ci: add base_sha to codecov/codecov-action upload step by @ko3n1g :: PR: #660
ci: build container once and share across downstream tests by @chtruong814 :: PR: #661
Remove trt-llm by @oyilmaz-nvidia :: PR: #662
Bump to vllm 0.20.1 and latest MBridge commit by @chtruong814 :: PR: #678
fix: Pin flashinfer-python to 0.6.8.post1 by @chtruong814 :: PR: #679
ci: Major refactor of release-workflows by @ko3n1g :: PR: #663
ci: remove build-docs workflow by @ko3n1g :: PR: #680
ci: validate release branch-rules by @ko3n1g :: PR: #683
ci: Bump CI image to 26.04 pytorch and 0.20.1 vllm by @chtruong814 :: PR: #696
fix: use eager attention for bidirectional ONNX export by @oliverholworthy :: PR: #698
Fix tokenizer issue with chat template by @oyilmaz-nvidia :: PR: #697
Be able to run individual tests by @oyilmaz-nvidia :: PR: #694
beep boop 🤖: Bumping NeMo-Export-Deploy to v0.6.1 by @nemo-automation-bot[bot] :: PR: #707
Set PATCH version to 0 in package_info.py by @balasaajay :: PR: #708
Version bump to 0.6.0rc0.dev0 (#642) by @github-actions[bot]
chore: bump _code_freeze workflow to v0.86.0 (#643) by @ko3n1g
build: Bump vLLM to address CVE (#644) by @ko3n1g
chore(beep boop 🤖): bump FW-CI-templates workflow pins to v0.88.0 (#646) by @svcnvidia-nemo-ci
Fix MLA model issues (#647) by @oyilmaz-nvidia
build: drop rc0 pre-release tag and add dynamic git versioning (#648) by @ko3n1g
build: Set trt-llm and vllm for 26.04 (#650) by @chtruong814
Fix VLM no image inference issue and add tests (#634) by @meatybobby
docs: bump versions1.json to 0.5.0 (latest) (#655) by @ko3n1g
docs: add SECURITY.md (#659) by @chtruong814
ci: add base_sha to codecov/codecov-action upload step (#660) by @ko3n1g
ci: build container once and share across downstream tests (#661) by @chtruong814
Remove trt-llm (#662) by @oyilmaz-nvidia
Bump to vllm 0.20.1 and latest MBridge commit (#678) by @chtruong814
fix: Pin flashinfer-python to 0.6.8.post1 (#679) by @chtruong814
ci: Major refactor of release-workflows (#663) by @ko3n1g
ci: remove build-docs workflow (#680) by @ko3n1g
ci: validate release branch-rules (#683) by @ko3n1g
ci: Bump CI image to 26.04 pytorch and 0.20.1 vllm (#696) by @chtruong814
fix: use eager attention for bidirectional ONNX export (#698) by @oliverholworthy
Fix tokenizer issue with chat template (#697) by @oyilmaz-nvidia
Be able to run individual tests (#694) by @oyilmaz-nvidia
beep boop 🤖: Bumping NeMo-Export-Deploy to v0.6.1 [skip ci] by @github-actions[bot]
Set PATCH version to 0 in package_info.py (#708) by @balasaajay

Contributors

oliverholworthy, meatybobby, and 5 other contributors

Assets 2

NVIDIA NeMo-Export-Deploy 0.5.0

svcnvidia-nemo-ci released this 16 Apr 20:49

v0.5.0

04ca37f

Changelog Details

Version bump to 0.5.0rc0.dev0 by @github-actions[bot] :: PR: #580
ci: Add secrets detector by @chtruong814 :: PR: #578
Add apply_chat_template to HF vllm Ray deployment by @athitten :: PR: #581
Onur/remove nemo2 trtllm support by @oyilmaz-nvidia :: PR: #576
Remove MM trt-llm files for nemo2 by @oyilmaz-nvidia :: PR: #583
ci: Adding to codeowners by @chtruong814 :: PR: #585
Remove more nemo2 and unused code. by @oyilmaz-nvidia :: PR: #584
docs: Remove uv sync with uv_args by @thomasdhc :: PR: #586
Update to use latest MBridge by @chtruong814 :: PR: #589
Add inference_max_seq_len to ray mbridge deployment path by @athitten :: PR: #588
Remove nemo imports by @oyilmaz-nvidia :: PR: #594
ci: Fix wheel build test and publish by @chtruong814 :: PR: #595
ci: Re-enable onnx test by @chtruong814 :: PR: #597
ci: Update release-docs workflow to use FW-CI-templates v0.72.0 by @chtruong814 :: PR: #599
feat: Pass ETP and Sequence Parallel to inframework Ray deployment by @ko3n1g :: PR: #600
ci: Update release workflows to include changelog and docs by @chtruong814 :: PR: #604
build: Remove torchao by @chtruong814 :: PR: #606
build: Upgrade vllm to 0.14.1 by @chtruong814 :: PR: #609
Add support for stop_words in Ray MBridge deployment by @athitten :: PR: #605
Add vllm docs for mbridge ckpt by @oyilmaz-nvidia :: PR: #573
Docs update: remove nemo2 and fix import by @oyilmaz-nvidia :: PR: #608
Update CI docker image and set vllm eager enforce_eager to False by @chtruong814 :: PR: #614
Fix building doc and remove all nemo 2.0 docs by @oyilmaz-nvidia :: PR: #615
Fix multimodal deployment sampling params by @meatybobby :: PR: #602
docs: Enable nightly docs build on main branch by @chtruong814 :: PR: #619
Set materialize_only_last_token_logits=False when log_probs = True by @athitten :: PR: #613
ci: Add-credentials-for-docs by @ko3n1g :: PR: #623
Fix release workflow reference by @chtruong814 :: PR: #625
Fix mbridge inference for latest mbridge by @oyilmaz-nvidia :: PR: #627
feat: Add support for batching of Ray Serve requests by @pthombre :: PR: #629
Remove all nemo2 imports from old repo by @oyilmaz-nvidia :: PR: #628
build: Bump export-deploy dependencies for 26.04 by @chtruong814 :: PR: #633
Docs: remove vLLM install step from mbridge vllm quickstart by @oyilmaz-nvidia :: PR: #618
Announce Python 3.12 migration by @ko3n1g :: PR: #630
ci: Enable claude review by @thomasdhc :: PR: #635
ci: Fix sso user check by @chtruong814 :: PR: #637
chore: test FW-CI-templates ko3n1g/fix/linkcheck-retry-backoff by @ko3n1g :: PR: #638
ci: upgrade GitHub Actions for Node.js 24 compatibility by @ko3n1g :: PR: #639
Add legacy_model_format param by @oyilmaz-nvidia :: PR: #641
chore: Move to Py3.12 by @ko3n1g :: PR: #631
cp: build: Bump vLLM to address CVE (644) into r0.5.0 by @svcnvidia-nemo-ci :: PR: #645
cp: Fix MLA model issues (647) into r0.5.0 by @svcnvidia-nemo-ci :: PR: #649
cp: build: Set trt-llm and vllm for 26.04 (650) into r0.5.0 by @svcnvidia-nemo-ci :: PR: #651

Contributors

thomasdhc, meatybobby, and 6 other contributors

Assets 2

NVIDIA NeMo-Export-Deploy 0.4.0

svcnvidia-nemo-ci released this 26 Feb 00:19

v0.4.0

2ba74b0

Highlights

vLLM support for Megatron-Bridge LLM checkpoints.
Remove NeMo 2.0 support.
Deployment of Megatron-Bridge VLM checkpoints

Changelog Details

Eval logprob benchmarks support for HF via vLLM with Ray by @athitten :: PR: #479
feat: add labeler by @pablo-garay :: PR: #483
Support apply_chat_template in NeMo MM in-framework deployment by @meatybobby :: PR: #440
NeMo-Export-Deploy 0.2.1 changelog by @pablo-garay :: PR: #489
Add torch_dtype and default values by @oyilmaz-nvidia :: PR: #466
Fix max token input by @oyilmaz-nvidia :: PR: #478
Remove scheduled cron job from release workflow by @pablo-garay :: PR: #494
feat: Add anchor by @pablo-garay :: PR: #495
[Eval] Fixes for compatibility between Pytriton, Ray deployments with nemo-run by @athitten :: PR: #501
New script path by @oyilmaz-nvidia :: PR: #487
Update trt-llm doc for nemo 2 by @oyilmaz-nvidia :: PR: #506
Change type for --runtime_env in ray in-fw deployment script by @athitten :: PR: #505
fix : New peft release adjust fix by @pablo-garay :: PR: #514
fix: ensure vLLM receives valid params regardless of env changes by @pablo-garay :: PR: #516
Fix minor doc issue by @oyilmaz-nvidia :: PR: #521
Update changelog for release 0.3.0 by @oyilmaz-nvidia :: PR: #522
Update nvidia-sphinx-theme by @chtruong814 :: PR: #528
Update changelog for version 0.3.1 by @pablo-garay :: PR: #537
Minor fixes for MBridge nemotron deployment by @athitten :: PR: #518
docs: Update docs version to latest by @chtruong814 :: PR: #553
docs: Fixing version1.json by @aschilling-nv :: PR: #554
Properly Handle DynamicInferenceRequestRecord with latest Mcore by @chtruong814 :: PR: #559
Add vllm support for mbridge by @oyilmaz-nvidia :: PR: #555
Temp fix for k8s issue by @ko3n1g :: PR: #565
ci: Enable AWS runners by @chtruong814 :: PR: #557
docs: Release docs by @ko3n1g :: PR: #566
Remove nemo from in-framework deployment by @oyilmaz-nvidia :: PR: #568
Fix chat endpoint support for Ray in-framework MBridge deployment by @athitten :: PR: #572
build: Update dependencies for 26.02 by @chtruong814 :: PR: #567
Remove nemo2 vllm support by @oyilmaz-nvidia :: PR: #571
Update multimodal in-framework FastAPI from NeMo to Megatron Bridge by @meatybobby :: PR: #511
Fix chat endpoint support for HF deployment with Ray by @athitten :: PR: #575
Add Ray Serve Deployment Support for Multimodal Models by @meatybobby :: PR: #574
cp: Add apply_chat_template to HF vllm Ray deployment (581) into r0.4.0 by @ko3n1g :: PR: #582
cp: Remove more nemo2 and unused code. (584) into r0.4.0 by @ko3n1g :: PR: #587
cp: docs: Remove uv sync with uv_args (586) into r0.4.0 by @ko3n1g :: PR: #591
cp: Add inference_max_seq_len to ray mbridge deployment path (588) into r0.4.0 by @ko3n1g :: PR: #593
cp: Fix wheel build test and publish (#595) in r0.4.0 by @chtruong814 :: PR: #596
cp: Re-enable onnx test (#597) in r0.4.0 by @chtruong814 :: PR: #598
cp: ci: Update release-docs workflow to use FW-CI-templates v0.72.0 (599) into r0.4.0 by @ko3n1g :: PR: #601
cp: ci: Update release workflows to include changelog and docs (604) into r0.4.0 by @ko3n1g :: PR: #607
cp: build: Remove torchao (606) into r0.4.0 by @ko3n1g :: PR: #610
cp: build: Upgrade vllm to 0.14.1 (#609) into r0.4.0 by @chtruong814 :: PR: #611
docs: Update docs for 0.4.0 by @chtruong814 :: PR: #612
cp: Update CI docker image and set vllm eager enforce_eager to False (614) into r0.4.0 by @svcnvidia-nemo-ci :: PR: #617
docs: Update docs version for 0.4.0 release by @chtruong814 :: PR: #620

Contributors

pablo-garay, meatybobby, and 6 other contributors

Assets 2

NVIDIA NeMo-Export-Deploy 0.3.1

chtruong814 released this 15 Dec 23:36

v0.3.1

44a30f0

Fix vLLM top_p parameter handling in HuggingFace Ray deployment (#524)
Pin peft dependency to <0.14.0 for compatibility (#524)

Assets 2

NVIDIA NeMo-Export-Deploy 0.3.0

chtruong814 released this 04 Dec 00:55

v0.3.0

2cdaf51

Update TensorRT-LLM export to use NeMo->HF->TensorRT-LLM export path
Add chat template support for VLM deployment.
Bug fixes and folder name updates such as updating nlp to llm.

Assets 2

NVIDIA NeMo-Export-Deploy 0.2.1

chtruong814 released this 22 Oct 23:36

v0.2.1

950000c

Bug fixes for HuggingFace model deployment (#459)
- Fixed HuggingFace deployable implementations for both Triton and Ray Serve backends
- Improved tokenizer handling in HuggingFace deployment scripts
Minor fixes for Ray deployment (#464)
- Additional bug fixes in Ray deployment utilities

Assets 2

NVIDIA NeMo-Export-Deploy 0.2.0

chtruong814 released this 09 Oct 20:01

v0.2.0

726695b

MegatronLM and Megatron-Bridge model deployment support with Triton Inference Server and Ray Serve
Multi-node multi-instance Ray Serve based deployment for NeMo 2, Megatron-Bridge, and Megatron-LM models.
Update vLLM export to use NeMo->HF->vLLM export path
Multi-Modal deployment for NeMo 2 models with Triton Inference Server
NeMo Retriever Text Reranking ONNX and TensorRT export support

Assets 2

NVIDIA NeMo-Export-Deploy 0.2.0rc2 Pre-release

Pre-release

chtruong814 released this 18 Aug 06:32

v0.2.0rc2

7867110

Prerelease: NVIDIA NeMo-Export-Deploy 0.2.0rc2 (2025-08-18)

Assets 2

NVIDIA NeMo-Export-Deploy 0.1.1

chtruong814 released this 15 Aug 08:24

v0.1.1

ca72da9

ci: Mock DCO check

Signed-off-by: oliver könig <okoenig@nvidia.com>

Assets 2

NVIDIA NeMo-Export-Deploy 0.2.0rc1 Pre-release

Pre-release

chtruong814 released this 14 Aug 15:54

v0.2.0rc1

62485cc

Prerelease: NVIDIA NeMo-Export-Deploy 0.2.0rc1 (2025-08-14)

Assets 2

Uh oh!

Releases: NVIDIA-NeMo/Export-Deploy

Release list

NVIDIA NeMo-Export-Deploy 0.6.0

Contributors

Uh oh!

NVIDIA NeMo-Export-Deploy 0.5.0

Contributors

Uh oh!

NVIDIA NeMo-Export-Deploy 0.4.0

Highlights

Contributors

Uh oh!

NVIDIA NeMo-Export-Deploy 0.3.1

Uh oh!

NVIDIA NeMo-Export-Deploy 0.3.0

Uh oh!

NVIDIA NeMo-Export-Deploy 0.2.1

Uh oh!

NVIDIA NeMo-Export-Deploy 0.2.0

Uh oh!

NVIDIA NeMo-Export-Deploy 0.2.0rc2

Uh oh!

NVIDIA NeMo-Export-Deploy 0.1.1

Uh oh!

NVIDIA NeMo-Export-Deploy 0.2.0rc1

Uh oh!