This repository has been archived by the owner on Mar 14, 2024. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix race condition in writing config to checkpoint
Summary: We used to have _all_ trainers write the config to the checkpoint, at the same time. This is already problematic but what's worse is that only trainer 0 was creating the checkpoint directory. Thus if it didn't exist and a non-0 trainer was the first to reach that point the write would fail. I'm fixing it in the same way we fixed all other similar issues: have only the rank-0 trainer write this. Reviewed By: adamlerer Differential Revision: D17787303 fbshipit-source-id: c3464dd9929ff95d54865ed03f041388d85c6f0d
- Loading branch information