Skip to content

Add option to skip init sync #127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Conversation

dl541
Copy link
Contributor

@dl541 dl541 commented Mar 10, 2025

No description provided.

dl541 added 4 commits March 8, 2025 17:03
Summary:
We currently always heal on step 0 to avoid synchronization issues. We want an option to support skipping this sync for users who set the PyTorch seed so all ranks are initialized with the same values.

This diff added a init_sync boolean flag that can be passed from the manager client in python to the manager service in rust. If the manager service skips the sync depending on whether the init_sync is true.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 10, 2025
@d4l3k
Copy link
Member

d4l3k commented Mar 10, 2025

@dl541

As promised, next steps on this PR:

  • add in a unit test for this sync logic on the rust side
  • add in a manager_test.py mocked test for the init_sync flag propagation
  • add in a manager_integ_test.py w/ seed + init_sync = False. This might be a little tricky to add asserts on correct behavior due to the multiple threads so this can be a stretch goal
  • Update PR description with a summary of this PR + a Test plan: ... section

Code pointers

dl541 added 2 commits March 11, 2025 09:23
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants