Fix/issue 633 offpolicy wait metrics(issue #633)#636
Open
LeeLeno wants to merge 4 commits into
Open
Conversation
Redefine the off-policy learner wait metric (issue #633) into four independent components so real blocking is no longer hidden by merging: collector_wait, replay_batch_wait, rank_barrier, sync_coordination. Multi-GPU measures the initial/final dist.barrier() outside collector_wait/train, and train_time becomes pure SGD compute. Renames timing/learner_wait_ms -> timing/learner_collector_wait_ms (breaking). Updates SAC/TD3 single + double-buffer + multi-GPU runners, APPO and HORA APPO callers, the OffPolicyLogger contract, tests, and the timing field docs. Refs #633. Pending: run make test-all on a torch device before opening a PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Exclude sync-coordination handshakes and warmup buffer-fill refreshes from collector_wait across the SAC/TD3 single, double-buffer and multi-GPU runners, and move the APPO/HORA collector_wait measurement ahead of the iteration-1 logger init so it no longer absorbs one-off display setup. Document the collector-side timing fields (timing/collector_*) in the off-policy timing reference. Refs #633. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
off-policy 训练的 learner
wait_time(终端 Wait /timing/learner_wait_ms)把"等数据"和同步点的 barrier / 轮询 / logger 刷新混在一起,不能反映 learner 真正阻塞等待 collector 产数据的时间。本 PR 按 #633 重新定义并实现该指标。计算逻辑变化对照(面板视角)
Learner 面板
log_buffer_fill刷新 + 循环内 sync 握手 put;多卡:在初始dist.barrier()之前测量;APPO/HORA:在 iter-1logger.start()与available()之前测量dist.barrier()(初始 + 最终)耗时之和trainer_done同步握手耗时(warmup 循环内 + 末尾释放)Param Sync (in Train)→Param Synciteration_time,非各项之和)perf/learner_pipeline_msCollector 面板
weight_sync_ms/action_select_ms/env_step_ms/replay_ms/sync_coordination_ms(SAC/TD3)env_step_total_ms/mlp_infer_ms(APPO)改动(对照验收标准)
① 文档化每个 off-policy timing 字段 —
docs/.../1-training/3-logging.md新增「Off-Policy 计时字段」:learner 9 项 + collector(SAC/TD3 5 项、APPO 2 项),均列终端字段、TensorBoard/W&B key 与含义。② collector wait 不混入 barrier / pack / H2D / logger 刷新 — 见上方对照表 Collector Wait 行;4 分量各自独立上报,不合并。
③ 单卡与多卡语义一致,多卡额外拆 rank barrier / param sync — 单/多卡产出同一套 4 分量;多卡额外
rank_barrier+param_sync,train两边均为纯计算;非 rank 0 计时恒 0。④ 测试覆盖三类场景 — collector 未及时产出(单卡 async)、batch 已 ready(多卡 spawn
replay_batch_wait==0)、多卡 barrier(rank_barrier单列且 collector_wait 不含它);另加不依赖 torch 的 logger 契约测试。破坏性变更
timing/learner_wait_ms→timing/learner_collector_wait_ms,旧看板/查询需更新。timing/learner_train_ms改为纯计算口径(不再含 param sync / barrier),跨历史 run 不可直接比较。影响
log_step调用方(5 个 runner + 实验跟踪测试)已同步更新。collector_wait ≥ 0有保证,未新增任何同步。验证
make test-all全绿(ruff / ruff format / mypy / pyright / pytest)。