Fix/issue 633 offpolicy wait metrics(issue #633) by LeeLeno · Pull Request #636 · unilabsim/UniLab

LeeLeno · 2026-06-23T09:14:58Z

背景

off-policy 训练的 learner wait_time（终端 Wait / timing/learner_wait_ms）把"等数据"和同步点的 barrier / 轮询 / logger 刷新混在一起，不能反映 learner 真正阻塞等待 collector 产数据的时间。本 PR 按 #633 重新定义并实现该指标。

计算逻辑变化对照（面板视角）

仅 Learner 面板的指标计算逻辑发生变化；Collector 面板的指标计算完全未动，本 PR 只为其补充文档。

Learner 面板

指标	变化类型	计算逻辑
Wait → Collector Wait	重命名 + 提纯	单卡：剔除 warmup `log_buffer_fill` 刷新 + 循环内 sync 握手 put；多卡：在初始 `dist.barrier()` 之前测量；APPO/HORA：在 iter-1 `logger.start()` 与 `available()` 之前测量
Replay Batch Wait	新增	replay pack / H2D batch-ready 轮询耗时（double-buffer / 多卡），预取命中时 ≈ 0
Rank Barrier	新增（从 Wait+Train 抽出）	多卡 `dist.barrier()`（初始 + 最终）耗时之和
Sync Coordination	新增（从 Wait 抽出）	`trainer_done` 同步握手耗时（warmup 循环内 + 末尾释放）
Train	多卡重定义（单卡不变）	改为纯 SGD 计算，不再包含 param sync 与最终 barrier
Param Sync	归属 + 标签变化（值不变）	算法不变；不再重复计入 Train；标签 `Param Sync (in Train)` → `Param Sync`
H2D Copy / Weight Sync / Iter Wall	不变	计算未改（Iter Wall 仍为原始 `iteration_time`，非各项之和）
派生 `perf/learner_pipeline_ms`	重定义	= H2D + Train + Param Sync + Weight Sync（原先不含 Param Sync）

Collector 面板

指标	变化类型
`weight_sync_ms` / `action_select_ms` / `env_step_ms` / `replay_ms` / `sync_coordination_ms`（SAC/TD3）	计算不变，仅新增文档
`env_step_total_ms` / `mlp_infer_ms`（APPO）	计算不变，仅新增文档（仍为单步 EMA）

改动（对照验收标准）

① 文档化每个 off-policy timing 字段 — docs/.../1-training/3-logging.md 新增「Off-Policy 计时字段」：learner 9 项 + collector（SAC/TD3 5 项、APPO 2 项），均列终端字段、TensorBoard/W&B key 与含义。

② collector wait 不混入 barrier / pack / H2D / logger 刷新 — 见上方对照表 Collector Wait 行；4 分量各自独立上报，不合并。

③ 单卡与多卡语义一致，多卡额外拆 rank barrier / param sync — 单/多卡产出同一套 4 分量；多卡额外 rank_barrier + param_sync，train 两边均为纯计算；非 rank 0 计时恒 0。

④ 测试覆盖三类场景 — collector 未及时产出（单卡 async）、batch 已 ready（多卡 spawn replay_batch_wait==0）、多卡 barrier（rank_barrier 单列且 collector_wait 不含它）；另加不依赖 torch 的 logger 契约测试。

破坏性变更

timing/learner_wait_ms → timing/learner_collector_wait_ms，旧看板/查询需更新。
多卡 timing/learner_train_ms 改为纯计算口径（不再含 param sync / barrier），跨历史 run 不可直接比较。

影响

纯计时/日志改动，未触及 loss / 优化 / 采样 / V-trace，训练结果不变。
所有 log_step 调用方（5 个 runner + 实验跟踪测试）已同步更新。
collector_wait 提纯仅影响 warmup 期（稳态本就纯），collector_wait ≥ 0 有保证，未新增任何同步。

验证

make test-all 全绿（ruff / ruff format / mypy / pyright / pytest）。

Redefine the off-policy learner wait metric (issue #633) into four independent components so real blocking is no longer hidden by merging: collector_wait, replay_batch_wait, rank_barrier, sync_coordination. Multi-GPU measures the initial/final dist.barrier() outside collector_wait/train, and train_time becomes pure SGD compute. Renames timing/learner_wait_ms -> timing/learner_collector_wait_ms (breaking). Updates SAC/TD3 single + double-buffer + multi-GPU runners, APPO and HORA APPO callers, the OffPolicyLogger contract, tests, and the timing field docs. Refs #633. Pending: run make test-all on a torch device before opening a PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Exclude sync-coordination handshakes and warmup buffer-fill refreshes from collector_wait across the SAC/TD3 single, double-buffer and multi-GPU runners, and move the APPO/HORA collector_wait measurement ahead of the iteration-1 logger init so it no longer absorbs one-off display setup. Document the collector-side timing fields (timing/collector_*) in the off-policy timing reference. Refs #633. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

LeeLeno and others added 4 commits June 23, 2026 14:33

docs: tidy off-policy timing field reference

21681e7

fix:CI

64758af

LeeLeno requested review from TATP-233 and caozx1110 as code owners June 23, 2026 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/issue 633 offpolicy wait metrics(issue #633)#636

Fix/issue 633 offpolicy wait metrics(issue #633)#636
LeeLeno wants to merge 4 commits into
mainfrom
fix/issue-633-offpolicy-wait-metrics

LeeLeno commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LeeLeno commented Jun 23, 2026

背景

计算逻辑变化对照（面板视角）

改动（对照验收标准）

破坏性变更

影响

验证

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant