-
Notifications
You must be signed in to change notification settings - Fork 238
Enable Intel GPU #753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Intel GPU #753
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/753
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hi @dbyoung18! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Thanks for your PR @dbyoung18, my preference here would be to land generic accelerator memory APIs in core and then use those. That way we wouldn't need to ask people that are trying to use Intel GPUs to change their code so it'd be something like @guangyey is doing some work on this at Intel and can share more information on what's the current plan of record |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
Hi @msaroufim and @dbyoung18 , let me explain the plan. |
Convert to draft first. Pending to #129919 ready. |
620a3bf
to
6fc261d
Compare
9022b38
to
37d9431
Compare
@dbyoung18 , may I know why the change is torchao/_models/llama/generate.py only? |
Signed-off-by: dbyoung18 <[email protected]>
37d9431
to
de4ac30
Compare
Hi, @EikanWang. We have a plan to gradually support torch-ao on Intel GPU with different models(llama2,llama3,sam etc.) and difference features(BF16/INT8/INT4/FP8 etc.). As the first step, we choose Llama2 & Llama3 BF16 as a start point. With this PR, Llama2-7b and Llama3-8b can run BF16 on Intel GPU under both eager mode & compile mode, by passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it up to repo maintainers, but IMO one need to think a bit more about unifying device approach rather than migrating long strings of elif
s from repo to repo
@@ -369,7 +381,8 @@ def callback(x): | |||
|
|||
tokpersec = torch.mean(torch.tensor(aggregate_metrics['tokens_per_sec'])).item() | |||
bandwidth = model_size * tokpersec | |||
mem = torch.cuda.max_memory_reserved() /1e9 | |||
max_memory_reserved = torch.cuda.max_memory_reserved() if "cuda" in device else torch.xpu.max_memory_reserved() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels wrong, as it will dispatch to xpu
for HIP devices as well, wouldn't it?
else: | ||
torch.profiler._utils._init_for_cuda_graphs() | ||
prof = torch.profiler.profile() | ||
if "cuda" in device: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please dismantle the pyramid of doom and use elif "cuda" in device" rather than
else:\n\tif "cuda" in device:"
And see my comment below again about "hip"
@@ -288,7 +290,10 @@ def main( | |||
|
|||
for i in range(start, num_samples): | |||
if i==0: | |||
torch.cuda.reset_peak_memory_stats() | |||
if "cuda" in device: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as below, can you please check that torch.cuda.reset_peak_memory_stats
does not need to be applied to hip
I'm already planning on writing an RFC on how we'll support more hardware architectures. RIght now ao is very much NVIDIA device centric but a lot of recent issues have been about supporting more hardware architectures on more operating systems. We need to think about generalizing devices, CI/testing and performance carefully. |
We are working on device-agnostic runtime API for accelerators. It may help @malfet , @msaroufim FYI - https://dev-discuss.pytorch.org/t/python-c-api-rules-for-device-generic-apis/2511 |
@EikanWang are there are any github runners for Intel GPUs to ensure our test suite works? We don't have to run the code per commit but at least a nightly check to make sure we understand what works and what doesnt would be helpful |
@msaroufim , may I know if two runners for Intel GPUs are good enough now for the ao nightly? |
Yup that should be fine! We won't be running on Intel runners per commit for now. cc @atalman @seemethere as well |
Sounds good! We will add two runners to Intel GPU CI/CD resource pool and reserve the two runners dedicated to ao nightly. |
Currently, we have 16 pytorch organization level xpu runners with label |
@chuanqi129 , @riverliuintel, any update? |
They'd need to be hooked up to the Nova workflows as well, see #999 which ran into some issues as well |
@msaroufim , may I know what "Nova workflows" means? It is a ao-specific workflow? |
@EikanWang we leverage some reusuable github workflows https://github.com/pytorch/ao/blob/main/.github/workflows/regression_test.yml#L68 produced by We could potentially do a 1 off run of our test suite to see what works in ao out of the box today but will be hard to track progress without the CI integration. As to how to integrate with Nova workflows your best bet is to reach out to @seemethere and @atalman on the Intel slack channel. Feel free to tag me there as well so we can move faster |
@dbyoung18 does this one support int4 woq ? |
Currently, it doesn't support int4 woq on Intel GPU. We are in the upstream progress to support INT4 xpu backend in PyTorch(targetting v2.5). Once the upstream is ready, will continue adding support on ao side. |
Closed due to duplicate w/ PR:ao#1259. THX for above review comments. |
This PR is migrated from gpt-fast #79. We would like to add initial support for Intel GPU in torch-ao with the device option "xpu"(i.e., --device "xpu"). Currently, both BF16 & INT8 under eager mode and compile mode are functionally supported. INT4 support and further performance improvement are WIP.
Here are the steps to run Llama2-7b and Llama3-8b generation on Intel GPU with torch-ao. We will update the tutorial later with improved performance.
Launch
python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth/model.pth --write_result benchmark_results.txt --device xpu --precision torch.bfloat16
python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth/model.pth --write_result benchmark_results.txt --device xpu --quantization int8dq
python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth/model.pth --write_result benchmark_results.txt --device xpu --quantization int8wo