Skip to content

[BugFix] Wrap system_allocators_ with RetryAllocator for GPU/XPU#78193

Closed
zhengshengning wants to merge 1 commit intoPaddlePaddle:developfrom
zhengshengning:fix/system-allocator-retry-wrap
Closed

[BugFix] Wrap system_allocators_ with RetryAllocator for GPU/XPU#78193
zhengshengning wants to merge 1 commit intoPaddlePaddle:developfrom
zhengshengning:fix/system-allocator-retry-wrap

Conversation

@zhengshengning
Copy link
Contributor

PR Category

Execute Infrastructure

PR Types

Bug fixes

Description

WrapCUDARetryAllocator 中,原有逻辑仅对 allocators_ 中的 GPU/XPU 分配器包装了 RetryAllocator,但遗漏了 system_allocators_。这导致通过 system_allocators_ 路径进行的显存分配在 OOM 时不会触发重试机制,直接抛出分配失败异常。

本次修复为 system_allocators_ 中的 GPU/XPU 分配器同样添加 RetryAllocator 包装,使其在显存不足时也能进行重试,与 allocators_ 行为保持一致。

改动要点

  • WrapCUDARetryAllocator 方法中新增对 system_allocators_ 的遍历
  • 对其中 GPU 和 XPU place 的分配器包装 RetryAllocator

是否引起精度变化

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@paddle-bot
Copy link

paddle-bot bot commented Mar 6, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant