Skip to content

Commit 41d9958

Browse files
ZhaoqiongZsvekars
andauthored
Update Inductor windows tutorial with xpu support (#3309)
--------- Co-authored-by: Svetlana Karslioglu <[email protected]>
1 parent 459084a commit 41d9958

File tree

4 files changed

+109
-129
lines changed

4 files changed

+109
-129
lines changed

Diff for: _static/img/install_msvc.png

131 KB
Loading

Diff for: prototype_source/inductor_windows.rst

+103
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
How to use ``torch.compile`` on Windows CPU/XPU
2+
===============================================
3+
4+
**Author**: `Zhaoqiong Zheng <https://github.com/ZhaoqiongZ>`_, `Xu, Han <https://github.com/xuhancn>`_
5+
6+
7+
Introduction
8+
------------
9+
10+
TorchInductor is the new compiler backend that compiles the FX Graphs generated by TorchDynamo into optimized C++/Triton kernels.
11+
12+
This tutorial introduces the steps for using TorchInductor via ``torch.compile`` on Windows CPU/XPU.
13+
14+
15+
Software Installation
16+
---------------------
17+
18+
Now, we will walk you through a step-by-step tutorial for how to use ``torch.compile`` on Windows CPU/XPU.
19+
20+
Install a Compiler
21+
^^^^^^^^^^^^^^^^^^
22+
23+
C++ compiler is required for TorchInductor optimization, let's take Microsoft Visual C++ (MSVC) as an example.
24+
25+
1. Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_.
26+
27+
1. During Installation, select **Workloads** and then **Desktop & Mobile**.
28+
1. Select a checkmark on **Desktop Development with C++** and install.
29+
30+
.. image:: ../_static/img/install_msvc.png
31+
32+
33+
.. note::
34+
35+
Windows CPU inductor also support C++ compiler `LLVM Compiler <https://github.com/llvm/llvm-project/releases>`_ and `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html>`_ for better performance.
36+
Please check `Alternative Compiler for better performance on CPU <#alternative-compiler-for-better-performance>`_.
37+
38+
Set Up Environment
39+
^^^^^^^^^^^^^^^^^^
40+
Next, let's configure our environment.
41+
42+
#. Open a command line environment via cmd.exe.
43+
#. Activate ``MSVC`` via below command::
44+
45+
"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat"
46+
#. Create and activate a virtual environment: ::
47+
#. Install `PyTorch 2.5 <https://pytorch.org/get-started/locally/>`_ or later for CPU Usage. Install PyTorch 2.7 or later refer to `Getting Started on Intel GPU <https://pytorch.org/docs/main/notes/get_start_xpu.html>`_ for XPU usage.
48+
#. Here is an example of how to use TorchInductor on Windows:
49+
.. code-block:: python
50+
51+
import torch
52+
device="cpu" # or "xpu" for XPU
53+
def foo(x, y):
54+
a = torch.sin(x)
55+
b = torch.cos(x)
56+
return a + b
57+
opt_foo1 = torch.compile(foo)
58+
print(opt_foo1(torch.randn(10, 10).to(device), torch.randn(10, 10).to(device)))
59+
60+
#. Below is the output of the above example::
61+
62+
tensor([[-3.9074e-02, 1.3994e+00, 1.3894e+00, 3.2630e-01, 8.3060e-01,
63+
1.1833e+00, 1.4016e+00, 7.1905e-01, 9.0637e-01, -1.3648e+00],
64+
[ 1.3728e+00, 7.2863e-01, 8.6888e-01, -6.5442e-01, 5.6790e-01,
65+
5.2025e-01, -1.2647e+00, 1.2684e+00, -1.2483e+00, -7.2845e-01],
66+
[-6.7747e-01, 1.2028e+00, 1.1431e+00, 2.7196e-02, 5.5304e-01,
67+
6.1945e-01, 4.6654e-01, -3.7376e-01, 9.3644e-01, 1.3600e+00],
68+
[-1.0157e-01, 7.7200e-02, 1.0146e+00, 8.8175e-02, -1.4057e+00,
69+
8.8119e-01, 6.2853e-01, 3.2773e-01, 8.5082e-01, 8.4615e-01],
70+
[ 1.4140e+00, 1.2130e+00, -2.0762e-01, 3.3914e-01, 4.1122e-01,
71+
8.6895e-01, 5.8852e-01, 9.3310e-01, 1.4101e+00, 9.8318e-01],
72+
[ 1.2355e+00, 7.9290e-02, 1.3707e+00, 1.3754e+00, 1.3768e+00,
73+
9.8970e-01, 1.1171e+00, -5.9944e-01, 1.2553e+00, 1.3394e+00],
74+
[-1.3428e+00, 1.8400e-01, 1.1756e+00, -3.0654e-01, 9.7973e-01,
75+
1.4019e+00, 1.1886e+00, -1.9194e-01, 1.3632e+00, 1.1811e+00],
76+
[-7.1615e-01, 4.6622e-01, 1.2089e+00, 9.2011e-01, 1.0659e+00,
77+
9.0892e-01, 1.1932e+00, 1.3888e+00, 1.3898e+00, 1.3218e+00],
78+
[ 1.4139e+00, -1.4000e-01, 9.1192e-01, 3.0175e-01, -9.6432e-01,
79+
-1.0498e+00, 1.4115e+00, -9.3212e-01, -9.0964e-01, 1.0127e+00],
80+
[ 5.7244e-04, 1.2799e+00, 1.3595e+00, 1.0907e+00, 3.7191e-01,
81+
1.4062e+00, 1.3672e+00, 6.8502e-02, 8.5216e-01, 8.6046e-01]])
82+
83+
Alternative Compiler for better performance on CPU
84+
--------------------------------------------------
85+
86+
To enhance performance for inductor on Windows CPU, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC.
87+
88+
Intel Compiler
89+
^^^^^^^^^^^^^^
90+
91+
#. Download and install `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html>`_ with Windows version.
92+
#. Set Windows Inductor Compiler via environment variable ``set CXX=icx-cl``.
93+
94+
LLVM Compiler
95+
^^^^^^^^^^^^^
96+
97+
#. Download and install `LLVM Compiler <https://github.com/llvm/llvm-project/releases>`_ and choose win64 version.
98+
#. Set Windows Inductor Compiler via environment variable ``set CXX=clang-cl``.
99+
100+
Conclusion
101+
----------
102+
103+
In this tutorial, we introduce how to use Inductor on Windows CPU with PyTorch 2.5 or later, and on Windows XPU with PyTorch 2.7 or later. We can also use Intel Compiler or LLVM Compiler to get better performance on CPU.

Diff for: prototype_source/inductor_windows_cpu.rst

+4-127
Original file line numberDiff line numberDiff line change
@@ -1,130 +1,7 @@
1-
How to use TorchInductor on Windows CPU
2-
=======================================
1+
This tutorial has been moved to https://pytorch.org/tutorials/prototype/inductor_windows.html.
32

4-
**Author**: `Zhaoqiong Zheng <https://github.com/ZhaoqiongZ>`_, `Xu, Han <https://github.com/xuhancn>`_
3+
Redirecting in 3 seconds...
54

5+
.. raw:: html
66

7-
8-
TorchInductor is a compiler backend that transforms FX Graphs generated by TorchDynamo into highly optimized C++/Triton kernels.
9-
This tutorial will guide you through the process of using TorchInductor on a Windows CPU.
10-
11-
.. grid:: 2
12-
13-
.. grid-item-card:: :octicon:`mortar-board;1em;` What you will learn
14-
:class-card: card-prerequisites
15-
16-
* How to compile and execute a Python function with PyTorch, optimized for Windows CPU
17-
* Basics of TorchInductor's optimization using C++/Triton kernels.
18-
19-
.. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites
20-
:class-card: card-prerequisites
21-
22-
* PyTorch v2.5 or later
23-
* Microsoft Visual C++ (MSVC)
24-
* Miniforge for Windows
25-
26-
Install the Required Software
27-
-----------------------------
28-
29-
First, let's install the required software. C++ compiler is required for TorchInductor optimization.
30-
We will use Microsoft Visual C++ (MSVC) for this example.
31-
32-
1. Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_.
33-
34-
2. During the installation, choose **Desktop Development with C++** in the **Desktop & Mobile** section in **Workloads** table. Then install the software
35-
36-
.. note::
37-
38-
We recommend C++ compiler `Clang <https://github.com/llvm/llvm-project/releases>`_ and `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html>`_.
39-
Please check `Alternative Compiler for better performance <#alternative-compiler-for-better-performance>`_.
40-
41-
3. Download and install `Miniforge3-Windows-x86_64.exe <https://github.com/conda-forge/miniforge/releases/latest/>`__.
42-
43-
Set Up the Environment
44-
----------------------
45-
46-
#. Open the command line environment via ``cmd.exe``.
47-
#. Activate ``MSVC`` with the following command:
48-
49-
.. code-block:: sh
50-
51-
"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat"
52-
#. Activate ``conda`` with the following command:
53-
54-
.. code-block:: sh
55-
56-
"C:/ProgramData/miniforge3/Scripts/activate.bat"
57-
#. Create and activate a custom conda environment:
58-
59-
.. code-block:: sh
60-
61-
conda create -n inductor_cpu_windows python=3.10 -y
62-
conda activate inductor_cpu_windows
63-
64-
#. Install `PyTorch 2.5 <https://pytorch.org/get-started/locally/>`_ or later.
65-
66-
Using TorchInductor on Windows CPU
67-
----------------------------------
68-
69-
Here’s a simple example to demonstrate how to use TorchInductor:
70-
71-
.. code-block:: python
72-
73-
74-
import torch
75-
def foo(x, y):
76-
a = torch.sin(x)
77-
b = torch.cos(y)
78-
return a + b
79-
opt_foo1 = torch.compile(foo)
80-
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))
81-
82-
Here is the sample output that this code might return:
83-
84-
.. code-block:: sh
85-
86-
tensor([[-3.9074e-02, 1.3994e+00, 1.3894e+00, 3.2630e-01, 8.3060e-01,
87-
1.1833e+00, 1.4016e+00, 7.1905e-01, 9.0637e-01, -1.3648e+00],
88-
[ 1.3728e+00, 7.2863e-01, 8.6888e-01, -6.5442e-01, 5.6790e-01,
89-
5.2025e-01, -1.2647e+00, 1.2684e+00, -1.2483e+00, -7.2845e-01],
90-
[-6.7747e-01, 1.2028e+00, 1.1431e+00, 2.7196e-02, 5.5304e-01,
91-
6.1945e-01, 4.6654e-01, -3.7376e-01, 9.3644e-01, 1.3600e+00],
92-
[-1.0157e-01, 7.7200e-02, 1.0146e+00, 8.8175e-02, -1.4057e+00,
93-
8.8119e-01, 6.2853e-01, 3.2773e-01, 8.5082e-01, 8.4615e-01],
94-
[ 1.4140e+00, 1.2130e+00, -2.0762e-01, 3.3914e-01, 4.1122e-01,
95-
8.6895e-01, 5.8852e-01, 9.3310e-01, 1.4101e+00, 9.8318e-01],
96-
[ 1.2355e+00, 7.9290e-02, 1.3707e+00, 1.3754e+00, 1.3768e+00,
97-
9.8970e-01, 1.1171e+00, -5.9944e-01, 1.2553e+00, 1.3394e+00],
98-
[-1.3428e+00, 1.8400e-01, 1.1756e+00, -3.0654e-01, 9.7973e-01,
99-
1.4019e+00, 1.1886e+00, -1.9194e-01, 1.3632e+00, 1.1811e+00],
100-
[-7.1615e-01, 4.6622e-01, 1.2089e+00, 9.2011e-01, 1.0659e+00,
101-
9.0892e-01, 1.1932e+00, 1.3888e+00, 1.3898e+00, 1.3218e+00],
102-
[ 1.4139e+00, -1.4000e-01, 9.1192e-01, 3.0175e-01, -9.6432e-01,
103-
-1.0498e+00, 1.4115e+00, -9.3212e-01, -9.0964e-01, 1.0127e+00],
104-
[ 5.7244e-04, 1.2799e+00, 1.3595e+00, 1.0907e+00, 3.7191e-01,
105-
1.4062e+00, 1.3672e+00, 6.8502e-02, 8.5216e-01, 8.6046e-01]])
106-
107-
Using an Alternative Compiler for Better Performance
108-
-------------------------------------------
109-
110-
To enhance performance on Windows inductor, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC.
111-
112-
Intel Compiler
113-
^^^^^^^^^^^^^^
114-
115-
#. Download and install `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html>`_ with Windows version.
116-
#. Set Windows Inductor Compiler with the CXX environment variable ``set CXX=icx-cl``.
117-
118-
Intel also provides a comprehensive step-by-step guide, complete with performance data. Please check `Intel® oneAPI DPC++/C++ Compiler Boosts PyTorch* Inductor Performance on Windows* for CPU Devices <https://www.intel.com/content/www/us/en/developer/articles/technical/boost-pytorch-inductor-performance-on-windows.html>`_.
119-
120-
LLVM Compiler
121-
^^^^^^^^^^^^^
122-
123-
#. Download and install `LLVM Compiler <https://github.com/llvm/llvm-project/releases>`_ and choose win64 version.
124-
#. Set Windows Inductor Compiler with the CXX environment variable ``set CXX=clang-cl``.
125-
126-
Conclusion
127-
----------
128-
129-
In this tutorial, we have learned how to use Inductor on Windows CPU with PyTorch. In addition, we discussed
130-
further performance improvements with Intel Compiler and LLVM Compiler.
7+
<meta http-equiv="Refresh" content="3; url='https://pytorch.org/tutorials/prototype/inductor_windows.html'" />

Diff for: prototype_source/prototype_index.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ Prototype features are not available as part of binary distributions like PyPI o
228228
:header: Inductor Windows CPU Tutorial
229229
:card_description: Speed up your models with Inductor On Windows CPU
230230
:image: ../_static/img/thumbnails/cropped/generic-pytorch-logo.png
231-
:link: ../prototype/inductor_windows_cpu.html
231+
:link: ../prototype/inductor_windows.html
232232
:tags: Model-Optimization
233233

234234
.. customcarditem::
@@ -286,7 +286,7 @@ Prototype features are not available as part of binary distributions like PyPI o
286286
prototype/flight_recorder_tutorial.html
287287
prototype/graph_mode_dynamic_bert_tutorial.html
288288
prototype/inductor_cpp_wrapper_tutorial.html
289-
prototype/inductor_windows_cpu.html
289+
prototype/inductor_windows.html
290290
prototype/pt2e_quantizer.html
291291
prototype/pt2e_quant_ptq.html
292292
prototype/pt2e_quant_qat.html

0 commit comments

Comments
 (0)