-
Notifications
You must be signed in to change notification settings - Fork 4.2k
FSDP2 tutorial: how FSDP2 works #3354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/weifengpy/1/base
Are you sure you want to change the base?
Conversation
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3354
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 75a4be6 with merge base 35c68ea ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@@ -0,0 +1,55 @@ | |||
Getting Started with Fully Sharded Data Parallel(FSDP) | |||
====================================================== | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add to index.rst.
Also, we can't merge ghstack in this repo. Can you please resubmit as a regular branch |
* Offers an extension point to customize the all-gather, e.g. for fp8 all-gather for fp8 linears. | ||
* Mixing frozen and non-frozen parameters can in the the same communication group without using extra memory. | ||
|
||
How to use FSDP2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still a work in progress?
Stack from ghstack (oldest at bottom):
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: