[Enhance] enhance muon config by nil0x9 · Pull Request #1610 · InternLM/xtuner

nil0x9 · 2026-03-20T19:58:45Z

add adjust_lr arg to MuonConfig and refactor muon build code;
avoid assigning 1D params to Muon (e.g. those of shape [1, D]);
default flatten of Muon param to True as currently batched params are rarely used.
[Breaking Change] adjust_lr is now no longer Optional, the original behavior corresponding to adjust_lr is None is now replaced with "none"

nil0x9 · 2026-03-20T20:01:11Z

@claude review

xtuner/v1/config/optim.py

claude · 2026-03-20T20:03:41Z

xtuner/v1/optim/muon.py

@@ -309,7 +309,7 @@ def __init__(
        weight_decay: float = 0.01,


Claude: Warning: This is a breaking change — both the default value ("spectral_norm" → "rms_norm") and the type (Optional[str] → str, None → "none" string) changed. Any existing callers passing adjust_lr=None will now get a ValueError. Consider keeping backward compatibility:

if adjust_lr is None: adjust_lr = "none"

or at minimum document this as a breaking change in the PR description.

xtuner/v1/config/optim.py

claude · 2026-03-20T20:04:11Z

Claude: ## Summary

Enhances MuonConfig with an adjust_lr field, filters out 1D-like params from Muon, sets flatten=True by default, and refactors the param collection loop. Also changes the adjust_lr API in Muon from Optional[str] (with None) to str (with "none").

Issues

Critical

optim.py:67: adjust_lr default is ("rms_norm",) — a tuple, not a string. Trailing comma bug.
optim.py:123: adjust_lr is still hardcoded to "rms_norm" in build() instead of using self.adjust_lr. The new config field has no effect.

Warning

optim.py:64: Literal type includes "constant" which is not a valid value in the Muon optimizer — would raise ValueError at runtime.
muon.py:309: Breaking change — adjust_lr type changed from Optional[str] to str, and None → "none". Existing callers passing None will get a ValueError.

Nit

optim.py:105-106: Missing space after comma in log message ("...M,total" → "...M, total").
optim.py:123: Typo in comment — "hace" → "have".

Verdict

REQUEST_CHANGES — the tuple default and unused config field are bugs that need fixing before merge.

nil0x9 · 2026-03-20T21:04:00Z

@claude review again

claude · 2026-03-20T21:06:38Z

xtuner/v1/config/optim.py

        for name, p in model.named_parameters():
            n = p.numel()
            num_total += n
            if name in trainable_names:


Claude: Nit: The p.numel() not in p.shape check is clever but non-obvious. A brief comment explaining the intent would help future readers — e.g., "exclude effectively-1D tensors where one dimension accounts for all elements (e.g. shape [1, D])."

claude · 2026-03-20T21:06:41Z

xtuner/v1/optim/muon.py

@@ -309,7 +309,7 @@ def __init__(
        weight_decay: float = 0.01,


Claude: Warning — Breaking API change: Two things changed here simultaneously:

The type changed from Optional[str] to str — any caller passing adjust_lr=None will now get a ValueError.

The default changed from "spectral_norm" to "rms_norm" — existing callers relying on the default will silently get different behavior.

Both changes may be intentional, but they could break downstream users of the Muon class directly. Consider either:

Accepting None as a deprecated alias for "none" (with a deprecation warning), or

Documenting this as a known breaking change in the PR description.

This change is intentional. If we want "rms_norm" adjusting method to be defalt behavior, then having an Optional[str] arg where None corresponds to "not adjusting" is paradoxical and confusing.

xtuner/v1/config/optim.py

claude · 2026-03-20T21:07:08Z

Claude: ## Re-review Summary

The critical issues from the previous review (tuple default, unused config field, invalid "constant" literal) are all fixed. Nice work.

Remaining Issues

Warning

muon.py:309: Breaking API change — adjust_lr type changed from Optional[str] to str and default changed from "spectral_norm" to "rms_norm". Callers passing None will break silently.

Nit

optim.py:85: p.numel() not in p.shape is clever but non-obvious — a brief comment would help readability.
optim.py:64: Trailing space in the help string.

Verdict

COMMENT — the code is correct and the previous critical issues are resolved. The breaking API change in Muon is the main thing to be aware of; if it's intentional, consider noting it in the PR description.

nil0x9 · 2026-03-20T21:20:45Z

All comments are addressed except one where claude bot noted the breaking change of adjust_lr. This is intentional -- if we want "rms_norm" adjusting method to be default behavior (as in optim.py), then having an Optional[str] arg where None corresponds to "not adjusting" is paradoxical, inconsistent and confusing (also for having a default "spectral_norm").

1. add adjust_lr arg to MuonConfig and refactor muon build code; 2. avoid assigning 1D params to Muon (e.g. those of shape [1, D]); 3. default flatten of Muon param to True as currently batched params are rarely used.

nil0x9 force-pushed the linty/enhance-muon-config branch from b5d7b48 to bc35c4e Compare March 20, 2026 20:00