-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Add Context Parallel tutorial #3319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3319
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 5872433 with merge base 7cb6915 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
out = F.scaled_dot_product_attention(*qkv, is_causal=True) | ||
|
||
# make a clean copy of QKV for output comparison | ||
cp_qkv = [t.detach().clone() for t in qkv] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, so this is not even needed for cp just for the reference?
I wonder if it's better to delete the reference. That's more appropriate for a unit test, but for an example people usually want something that's minimal and copy able, and in this case they might be distracted by this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor formatting fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a wrong statement about pass-kv
, we should change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two final comments. LGTM
Can you please rebase the base branch to main. |
Summary: The compiled model run takes the same input as Eager. No need to explicitly compose args as a tuple.
PyTorch 2.7 release of
torch.distributed.tensor.experimental._attention
API: https://pytorch.org/docs/stable/distributed.tensor.html#torch.distributed.tensor.experimental.context_parallel
tutorial preview: https://docs-preview.pytorch.org/pytorch/tutorials/3319/prototype/context_parallel.html