Skip to content

Conversation

@OutisLi
Copy link
Collaborator

@OutisLi OutisLi commented Jan 16, 2026

Summary by CodeRabbit

  • New Features

    • Centralized, configurable "optimizer" section for training configs with per-model overrides for multi-task runs.
    • Added support for Adam/AdamW across backends; LKF, AdaMuon, HybridMuon available for PyTorch.
    • Exposed optimizer hyperparameters (beta1, beta2, weight_decay) and integrated LR schedulers with warm‑up.
  • Documentation

    • Docs updated with optimizer guide, examples, and framework-specific availability notes.
  • Tests

    • Updated example config to use the new optimizer block.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 16, 2026 15:48
@dosubot dosubot bot added the enhancement label Jan 16, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors the optimizer configuration schema to promote it from being nested within the training section to a dedicated top-level optimizer section in the input JSON configuration. This change improves consistency and clarity across the DeePMD-kit codebase.

Changes:

  • Introduced a new top-level optimizer section in the configuration schema with support for multiple optimizer types (Adam, AdamW, LKF, AdaMuon, HybridMuon)
  • Updated all backend implementations (TensorFlow, PyTorch, Paddle) to extract optimizer parameters from the new location
  • Updated documentation to reference the new optimizer section across multiple training guides

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
doc/train/training-advanced.md Added comprehensive documentation for the new optimizer section with examples and backend-specific support information
doc/model/train-fitting-tensor.md Updated to reference the new optimizer section in the configuration structure
doc/model/train-fitting-property.md Updated to reference the new optimizer section in the configuration structure
doc/model/train-fitting-dos.md Updated to reference the new optimizer section in the configuration structure
deepmd/utils/argcheck.py Defined optimizer schema with ArgsPlugin pattern, moved optimizer definitions from training_args to standalone optimizer_args, added support for all optimizer types with proper defaults
deepmd/tf/train/trainer.py Updated to extract optimizer parameters from new optimizer section, added early validation for TensorFlow-unsupported features (weight_decay, non-Adam optimizers), implemented beta1/beta2 parameters
deepmd/pt/train/training.py Updated to extract optimizer parameters from new optimizer section, refactored get_opt_param to include optimizer-specific default values, expanded optimizer initialization with explicit beta parameters
deepmd/pd/train/training.py Updated to extract optimizer parameters from new optimizer section, added beta1/beta2/weight_decay parameters to Adam optimizer initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 16, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Centralizes optimizer configuration via a top-level "optimizer" config and propagates optimizer/type-specific parameters (adam_beta1, adam_beta2, weight_decay) across PyTorch, Paddle, and TensorFlow backends, updates argument descriptors, and adds documentation and test config changes to surface optimizer choices.

Changes

Cohort / File(s) Summary
PyTorch Training
deepmd/pt/train/training.py
Add global optimizer config usage; replace opt_typetype; validate supported optimizers (Adam, AdamW, LKF, AdaMuon, HybridMuon); apply per-type defaults; load optimizer state on restart; attach LambdaLR warm-up scheduler.
Paddle Training
deepmd/pd/train/training.py
Centralize optimizer params; read type key; extend Adam params (beta1, beta2, weight_decay); retain multi-task per-model optim_dict handling; pass betas/weight_decay into Adam initialization.
TensorFlow Training
deepmd/tf/train/trainer.py
Parse jdata["optimizer"]; expose optimizer_type, optimizer_beta1/2, optimizer_weight_decay; enforce Adam-only and weight_decay==0; construct AdamOptimizer with explicit beta1/beta2 across all branches.
Argument Validation / CLI
deepmd/utils/argcheck.py
Add optimizer argument plugin and registrars for Adam, AdamW, LKF, AdaMuon, HybridMuon; introduce optimizer_variant_type_args() and optimizer_args(); integrate optimizer into gen_args() for single/multi-task.
Documentation
doc/train/training-advanced.md, doc/model/train-fitting-*.md
Add "Optimizer" subsection, example JSON, supported-optimizer lists per framework; add {optimizer} to training JSON docs and update guide references.
Tests / Examples
source/tests/pt/model/water/lkf.json
Move optimizer fields into top-level optimizer object (e.g., "type": "LKF", "kf_blocksize": 1024) and remove old training.opt_type usage.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • wanghan-iapcm
  • njzjz
  • iProzd
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: refactoring optimizer configuration and backend handling across multiple frameworks (TensorFlow, PyTorch, Paddle) with schema standardization.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@doc/model/train-fitting-dos.md`:
- Line 19: The README uses non-descriptive link text "here"; change it to a
descriptive phrase and update the inline reference so readers know what they’re
clicking. Replace "here" with something like "training parameters reference" or
"training keyword definitions" while keeping the same target file
(train-se-e2-a.md), and ensure the sentence still mentions the relevant keywords
{ref}`model[standard]/fitting_net <model[standard]/fitting_net>` and {ref}`loss
<loss>` so readers can find the training parameter explanations for fitting the
dos.

In `@doc/model/train-fitting-property.md`:
- Line 17: Replace the ambiguous link text "here" with a descriptive phrase so
the sentence reads clearly and improves accessibility; specifically update the
sentence referencing train-se-atten.md to use descriptive link text such as "SE
attention training documentation" (or similar) instead of "here", keeping the
surrounding refs ({ref}`model`, {ref}`learning_rate`, {ref}`optimizer`,
{ref}`loss`, {ref}`training`) and the callouts to
{ref}`model[standard]/fitting_net` and {ref}`loss` intact.

In `@doc/model/train-fitting-tensor.md`:
- Line 33: Replace the non-descriptive link text "here" in the sentence
describing training JSON (the line referencing `input.json`, `ener` mode and the
sections {ref}`model <model>`, {ref}`learning_rate <learning_rate>`,
{ref}`optimizer <optimizer>`, {ref}`loss <loss>` and {ref}`training <training>`)
with a descriptive link label that reflects the target content (e.g., "training
keywords and meanings" or "training keyword reference") and point it to
train-se-e2-a.md so the link text clearly conveys what the reader will find.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b0c3b0 and c1d615b.

📒 Files selected for processing (8)
  • deepmd/pd/train/training.py
  • deepmd/pt/train/training.py
  • deepmd/tf/train/trainer.py
  • deepmd/utils/argcheck.py
  • doc/model/train-fitting-dos.md
  • doc/model/train-fitting-property.md
  • doc/model/train-fitting-tensor.md
  • doc/train/training-advanced.md
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-12T13:40:14.334Z
Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: Test PyTorch training with `cd examples/water/se_e2_a && dp --pt train input_torch.json --skip-neighbor-stat` and allow timeout of 60+ seconds for validation

Applied to files:

  • doc/model/train-fitting-tensor.md
📚 Learning: 2025-12-12T13:40:14.334Z
Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: Test TensorFlow training with `cd examples/water/se_e2_a && dp train input.json --skip-neighbor-stat` and allow timeout of 60+ seconds for validation

Applied to files:

  • doc/model/train-fitting-tensor.md
📚 Learning: 2026-01-10T04:28:18.703Z
Learnt from: OutisLi
Repo: deepmodeling/deepmd-kit PR: 5130
File: deepmd/pt/optimizer/adamuon.py:40-109
Timestamp: 2026-01-10T04:28:18.703Z
Learning: The coefficients (3.4445, -4.7750, 2.0315) in the `zeropower_via_newtonschulz5` function in `deepmd/pt/optimizer/adamuon.py` are standard across AdaMuon implementations and should not be modified for consistency with the reference implementations.

Applied to files:

  • deepmd/pt/train/training.py
🧬 Code graph analysis (2)
deepmd/utils/argcheck.py (1)
deepmd/utils/plugin.py (2)
  • register (41-59)
  • register (122-141)
deepmd/pd/train/training.py (1)
deepmd/pt/train/training.py (1)
  • get_opt_param (160-197)
🪛 markdownlint-cli2 (0.18.1)
doc/model/train-fitting-property.md

17-17: Link text should be descriptive

(MD059, descriptive-link-text)

🪛 Ruff (0.14.11)
deepmd/pt/train/training.py

178-178: Avoid specifying long messages outside the exception class

(TRY003)

deepmd/tf/train/trainer.py

130-132: Avoid specifying long messages outside the exception class

(TRY003)


134-137: Avoid specifying long messages outside the exception class

(TRY003)


325-327: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (41)
  • GitHub Check: Agent
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test C++ (true, false, false, true)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Test C++ (true, true, true, false)
  • GitHub Check: Test Python (8, 3.13)
  • GitHub Check: Test C++ (false, false, false, true)
  • GitHub Check: Test C++ (false, true, true, false)
  • GitHub Check: Test Python (11, 3.10)
  • GitHub Check: Test Python (11, 3.13)
  • GitHub Check: Test Python (10, 3.10)
  • GitHub Check: Test Python (9, 3.13)
  • GitHub Check: Test Python (10, 3.13)
  • GitHub Check: Test Python (8, 3.10)
  • GitHub Check: Test Python (3, 3.13)
  • GitHub Check: Test Python (9, 3.10)
  • GitHub Check: Test Python (4, 3.10)
  • GitHub Check: Test Python (7, 3.13)
  • GitHub Check: Test Python (6, 3.13)
  • GitHub Check: Test Python (12, 3.10)
  • GitHub Check: Test Python (12, 3.13)
  • GitHub Check: Test Python (1, 3.13)
  • GitHub Check: Test Python (3, 3.10)
  • GitHub Check: Test Python (6, 3.10)
  • GitHub Check: Test Python (5, 3.13)
  • GitHub Check: Test Python (7, 3.10)
  • GitHub Check: Test Python (4, 3.13)
  • GitHub Check: Test Python (2, 3.13)
  • GitHub Check: Test Python (2, 3.10)
  • GitHub Check: Test Python (5, 3.10)
  • GitHub Check: Test Python (1, 3.10)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (12)
deepmd/pt/train/training.py (4)

126-129: Verify legacy optimizer keys are handled.

The trainer now reads only the top-level optimizer section when optim_dict is absent. If existing configs still set training.opt_type / weight_decay / adam_beta2, those values will be ignored. Please confirm this is an intentional breaking change or add a fallback/deprecation error to make it explicit.

Also applies to: 319-320


160-197: LGTM: per-optimizer defaults centralized.

Nice consolidation of default hyperparameters and validation in get_opt_param.


775-793: Nice: AdaMuon now restores optimizer state + scheduler.

Keeps resume behavior consistent with Adam/AdamW.


733-764: No changes needed — fused parameter is guaranteed in all supported PyTorch versions.

The project requires PyTorch ≥2.7 (GPU) or ≥2.8 (CPU) per pyproject.toml. Both torch.optim.Adam (fused since 1.13) and torch.optim.AdamW (fused since 2.0) fully support the fused parameter across all pinned versions. The current implementation is correct and doesn't require guards.

Likely an incorrect or invalid review comment.

deepmd/utils/argcheck.py (3)

2568-2852: LGTM: optimizer schema centralized under optimizer.

Clear variant-based definitions and docstrings.


3683-3684: No review comment needed.


3754-3768: LGTM: optimizer args wired into gen_args for single/multi-task.

deepmd/pd/train/training.py (2)

116-163: LGTM: optimizer params now sourced from the top-level config.

Parsing the optimizer block once keeps the trainer aligned with the new schema.

Also applies to: 250-266


595-606: LGTM: optimizer hyperparams are now plumbed through.

deepmd/tf/train/trainer.py (2)

123-137: Optimizer parsing/validation looks solid.


323-348: Optimizer construction is consistent across branches.

doc/train/training-advanced.md (1)

48-63: Optimizer section is clear and well-scoped.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

❌ Patch coverage is 75.29412% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.92%. Comparing base (8b0c3b0) to head (977db22).

Files with missing lines Patch % Lines
deepmd/pt/train/training.py 51.51% 16 Missing ⚠️
deepmd/tf/train/trainer.py 69.23% 4 Missing ⚠️
deepmd/pd/train/training.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5157      +/-   ##
==========================================
- Coverage   81.92%   81.92%   -0.01%     
==========================================
  Files         714      714              
  Lines       73301    73371      +70     
  Branches     3616     3616              
==========================================
+ Hits        60055    60106      +51     
- Misses      12083    12103      +20     
+ Partials     1163     1162       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@OutisLi OutisLi added the breaking change Breaking changes that should notify users. label Jan 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change Breaking changes that should notify users. Docs enhancement Python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant