Skip to content

[Enhance] enhance muon config#1610

Open
nil0x9 wants to merge 1 commit intoInternLM:mainfrom
nil0x9:linty/enhance-muon-config
Open

[Enhance] enhance muon config#1610
nil0x9 wants to merge 1 commit intoInternLM:mainfrom
nil0x9:linty/enhance-muon-config

Conversation

@nil0x9
Copy link
Collaborator

@nil0x9 nil0x9 commented Mar 20, 2026

  1. add adjust_lr arg to MuonConfig and refactor muon build code;
  2. avoid assigning 1D params to Muon (e.g. those of shape [1, D]);
  3. default flatten of Muon param to True as currently batched params are rarely used.
  4. [Breaking Change] adjust_lr is now no longer Optional, the original behavior corresponding to adjust_lr is None is now replaced with "none"

@nil0x9 nil0x9 force-pushed the linty/enhance-muon-config branch from b5d7b48 to bc35c4e Compare March 20, 2026 20:00
@nil0x9
Copy link
Collaborator Author

nil0x9 commented Mar 20, 2026

@claude review

@@ -309,7 +309,7 @@ def __init__(
weight_decay: float = 0.01,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Warning: This is a breaking change — both the default value ("spectral_norm""rms_norm") and the type (Optional[str]str, None"none" string) changed. Any existing callers passing adjust_lr=None will now get a ValueError. Consider keeping backward compatibility:

if adjust_lr is None:
    adjust_lr = "none"

or at minimum document this as a breaking change in the PR description.

@claude
Copy link

claude bot commented Mar 20, 2026

Claude: ## Summary

Enhances MuonConfig with an adjust_lr field, filters out 1D-like params from Muon, sets flatten=True by default, and refactors the param collection loop. Also changes the adjust_lr API in Muon from Optional[str] (with None) to str (with "none").

Issues

Critical

  • optim.py:67: adjust_lr default is ("rms_norm",) — a tuple, not a string. Trailing comma bug.
  • optim.py:123: adjust_lr is still hardcoded to "rms_norm" in build() instead of using self.adjust_lr. The new config field has no effect.

Warning

  • optim.py:64: Literal type includes "constant" which is not a valid value in the Muon optimizer — would raise ValueError at runtime.
  • muon.py:309: Breaking change — adjust_lr type changed from Optional[str] to str, and None"none". Existing callers passing None will get a ValueError.

Nit

  • optim.py:105-106: Missing space after comma in log message ("...M,total""...M, total").
  • optim.py:123: Typo in comment — "hace" → "have".

Verdict

REQUEST_CHANGES — the tuple default and unused config field are bugs that need fixing before merge.

@nil0x9 nil0x9 force-pushed the linty/enhance-muon-config branch 2 times, most recently from 6e73792 to edd7d90 Compare March 20, 2026 21:00
@nil0x9
Copy link
Collaborator Author

nil0x9 commented Mar 20, 2026

@claude review again

for name, p in model.named_parameters():
n = p.numel()
num_total += n
if name in trainable_names:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit: The p.numel() not in p.shape check is clever but non-obvious. A brief comment explaining the intent would help future readers — e.g., "exclude effectively-1D tensors where one dimension accounts for all elements (e.g. shape [1, D])."

@@ -309,7 +309,7 @@ def __init__(
weight_decay: float = 0.01,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Warning — Breaking API change: Two things changed here simultaneously:

  1. The type changed from Optional[str] to str — any caller passing adjust_lr=None will now get a ValueError.
  2. The default changed from "spectral_norm" to "rms_norm" — existing callers relying on the default will silently get different behavior.

Both changes may be intentional, but they could break downstream users of the Muon class directly. Consider either:

  • Accepting None as a deprecated alias for "none" (with a deprecation warning), or
  • Documenting this as a known breaking change in the PR description.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is intentional. If we want "rms_norm" adjusting method to be defalt behavior, then having an Optional[str] arg where None corresponds to "not adjusting" is paradoxical and confusing.

@claude
Copy link

claude bot commented Mar 20, 2026

Claude: ## Re-review Summary

The critical issues from the previous review (tuple default, unused config field, invalid "constant" literal) are all fixed. Nice work.

Remaining Issues

Warning

  • muon.py:309: Breaking API change — adjust_lr type changed from Optional[str] to str and default changed from "spectral_norm" to "rms_norm". Callers passing None will break silently.

Nit

  • optim.py:85: p.numel() not in p.shape is clever but non-obvious — a brief comment would help readability.
  • optim.py:64: Trailing space in the help string.

Verdict

COMMENT — the code is correct and the previous critical issues are resolved. The breaking API change in Muon is the main thing to be aware of; if it's intentional, consider noting it in the PR description.

@nil0x9 nil0x9 force-pushed the linty/enhance-muon-config branch from edd7d90 to 22e6089 Compare March 20, 2026 21:15
@nil0x9
Copy link
Collaborator Author

nil0x9 commented Mar 20, 2026

All comments are addressed except one where claude bot noted the breaking change of adjust_lr. This is intentional -- if we want "rms_norm" adjusting method to be default behavior (as in optim.py), then having an Optional[str] arg where None corresponds to "not adjusting" is paradoxical, inconsistent and confusing (also for having a default "spectral_norm").

1. add adjust_lr arg to MuonConfig and refactor muon build code;
2. avoid assigning 1D params to Muon (e.g. those of shape [1, D]);
3. default flatten of Muon param to True as currently batched params are
rarely used.
@nil0x9 nil0x9 force-pushed the linty/enhance-muon-config branch from 22e6089 to 3acab28 Compare March 20, 2026 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant