pytorch · svekars · Apr 18, 2025 · Oct 21, 2024 · Oct 21, 2024 · Oct 21, 2024
diff --git a/prototype_source/inductor_windows.rst b/prototype_source/inductor_windows.rst
@@ -0,0 +1,111 @@
+How to use ``torch.compile`` on Windows CPU/XPU
+===============================================
+
+**Author**: `Zhaoqiong Zheng <https://github.com/ZhaoqiongZ>`_, `Xu, Han <https://github.com/xuhancn>`_
+
+
+Introduction
+------------
+
+TorchInductor is the new compiler backend that compiles the FX Graphs generated by TorchDynamo into optimized C++/Triton kernels.
+
+This tutorial introduces the steps for utilizing TorchInductor via ``torch.compile`` on Windows CPU/XPU.
+
+
+Software Installation
+---------------------
+
+Now, we will walk you through a step-by-step tutorial for how to use ``torch.compile`` on Windows CPU/XPU.
+
+Install a Compiler
+^^^^^^^^^^^^^^^^^^
+
+C++ compiler is required for torchinductor optimization, let's take Microsoft Visual C++ (MSVC) as an example.
+
+Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_.
+
+During Installation, select ``Workloads`` table then ``Desktop & Mobile`` Section, check mark on ``Desktop Development with C++`` and then install.
+
+.. note::
+
+    Windows CPU inductor also support C++ compiler `LLVM Compiler <https://github.com/llvm/llvm-project/releases>`_ and `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html>`_ for better performance.
+    Please check `Alternative Compiler for better performance on CPU <#alternative-compiler-for-better-performance>`_.
+
+Conda Installation
+^^^^^^^^^^^^^^^^^^
+
+Prepare Conda Environment by Miniforge or Anaconda.
+For example, download and install `Miniforge <https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Windows-x86_64.exe>`_.
+
+Set Up Environment
+^^^^^^^^^^^^^^^^^^
+
+#. Open a command line environment via cmd.exe.
+#. Activate ``MSVC`` via below command::
+
+    "C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat"
+#. Activate ``conda`` via below command::
+
+    "C:/ProgramData/miniforge3/Scripts/activate.bat"
+#. Create and activate customer conda environment::
+
+    conda create -n inductor_windows python=3.10 -y 
+#. Activate customer conda environment::
+
+    conda activate inductor_windows
+#. Install `PyTorch 2.5 <https://pytorch.org/get-started/locally/>`_ or later for CPU Usage. Install PyTorch 2.7 or later refer to `Getting Started on Intel GPU <https://pytorch.org/docs/main/notes/get_start_xpu.html>`_ for XPU usage.
+#. Use torchinductor on Windows::
+
+    import torch
+    device="cpu" # or "xpu" for XPU
+    def foo(x, y):
+        a = torch.sin(x)
+        b = torch.cos(x)
+        return a + b
+    opt_foo1 = torch.compile(foo)
+    print(opt_foo1(torch.randn(10, 10).to(device), torch.randn(10, 10).to(device)))
+
+#. Output of the above example::
+
+    tensor([[-3.9074e-02,  1.3994e+00,  1.3894e+00,  3.2630e-01,  8.3060e-01,
+            1.1833e+00,  1.4016e+00,  7.1905e-01,  9.0637e-01, -1.3648e+00],
+            [ 1.3728e+00,  7.2863e-01,  8.6888e-01, -6.5442e-01,  5.6790e-01,
+            5.2025e-01, -1.2647e+00,  1.2684e+00, -1.2483e+00, -7.2845e-01],
+            [-6.7747e-01,  1.2028e+00,  1.1431e+00,  2.7196e-02,  5.5304e-01,
+            6.1945e-01,  4.6654e-01, -3.7376e-01,  9.3644e-01,  1.3600e+00],
+            [-1.0157e-01,  7.7200e-02,  1.0146e+00,  8.8175e-02, -1.4057e+00,
+            8.8119e-01,  6.2853e-01,  3.2773e-01,  8.5082e-01,  8.4615e-01],
+            [ 1.4140e+00,  1.2130e+00, -2.0762e-01,  3.3914e-01,  4.1122e-01,
+            8.6895e-01,  5.8852e-01,  9.3310e-01,  1.4101e+00,  9.8318e-01],
+            [ 1.2355e+00,  7.9290e-02,  1.3707e+00,  1.3754e+00,  1.3768e+00,
+            9.8970e-01,  1.1171e+00, -5.9944e-01,  1.2553e+00,  1.3394e+00],
+            [-1.3428e+00,  1.8400e-01,  1.1756e+00, -3.0654e-01,  9.7973e-01,
+            1.4019e+00,  1.1886e+00, -1.9194e-01,  1.3632e+00,  1.1811e+00],
+            [-7.1615e-01,  4.6622e-01,  1.2089e+00,  9.2011e-01,  1.0659e+00,
+            9.0892e-01,  1.1932e+00,  1.3888e+00,  1.3898e+00,  1.3218e+00],
+            [ 1.4139e+00, -1.4000e-01,  9.1192e-01,  3.0175e-01, -9.6432e-01,
+            -1.0498e+00,  1.4115e+00, -9.3212e-01, -9.0964e-01,  1.0127e+00],
+            [ 5.7244e-04,  1.2799e+00,  1.3595e+00,  1.0907e+00,  3.7191e-01,
+            1.4062e+00,  1.3672e+00,  6.8502e-02,  8.5216e-01,  8.6046e-01]])
+
+Alternative Compiler for better performance on CPU
+--------------------------------------------------
+
+To enhance performance for inductor on Windows CPU, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC.
+
+Intel Compiler
+^^^^^^^^^^^^^^
+
+#. Download and install `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html>`_ with Windows version.
+#. Set Windows Inductor Compiler via environment variable ``set CXX=icx-cl``
+
+LLVM Compiler
+^^^^^^^^^^^^^
+
+#. Download and install `LLVM Compiler <https://github.com/llvm/llvm-project/releases>`_ and choose win64 version.
+#. Set Windows Inductor Compiler via environment variable ``set CXX=clang-cl`` 
+
+Conclusion
+----------
+
+With this tutorial, we introduce how to use Inductor on Windows CPU/XPU with PyTorch. We can use Intel Compiler or LLVM Compiler to get better performance on CPU.
diff --git a/prototype_source/inductor_windows_cpu.rst b/prototype_source/inductor_windows_cpu.rst
diff --git a/prototype_source/prototype_index.rst b/prototype_source/prototype_index.rst
@@ -221,7 +221,7 @@ Prototype features are not available as part of binary distributions like PyPI o
    :header: Inductor Windows CPU Tutorial
    :card_description: Speed up your models with Inductor On Windows CPU
    :image: ../_static/img/thumbnails/cropped/generic-pytorch-logo.png
-   :link: ../prototype/inductor_windows_cpu.html
+   :link: ../prototype/inductor_windows.html
    :tags: Model-Optimization
 
 .. customcarditem::
@@ -271,7 +271,7 @@ Prototype features are not available as part of binary distributions like PyPI o
    prototype/flight_recorder_tutorial.html
    prototype/graph_mode_dynamic_bert_tutorial.html
    prototype/inductor_cpp_wrapper_tutorial.html
-   prototype/inductor_windows_cpu.html
+   prototype/inductor_windows.html
    prototype/pt2e_quantizer.html
    prototype/pt2e_quant_ptq.html
    prototype/pt2e_quant_qat.html