You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Compiled Autograd is a torch.compile extension introduced in PyTorch 2.4
29
+
# Compiled Autograd is a ``torch.compile`` extension introduced in PyTorch 2.4
30
30
# that allows the capture of a larger backward graph.
31
31
#
32
-
# Doesn't torch.compile already capture the backward graph?
33
-
# ------------
34
-
# And it does, **partially**. AOTAutograd captures the backward graph ahead-of-time, but with certain limitations:
35
-
# 1. Graph breaks in the forward lead to graph breaks in the backward
36
-
# 2. `Backward hooks <https://pytorch.org/docs/stable/notes/autograd.html#backward-hooks-execution>`_ are not captured
32
+
# While ``torch.compile`` does capture the backward graph, it does so **partially**. The AOTAutograd component captures the backward graph ahead-of-time, with certain limitations:
33
+
# * Graph breaks in the forward lead to graph breaks in the backward
34
+
# * `Backward hooks <https://pytorch.org/docs/stable/notes/autograd.html#backward-hooks-execution>`_ are not captured
37
35
#
38
36
# Compiled Autograd addresses these limitations by directly integrating with the autograd engine, allowing
39
37
# it to capture the full backward graph at runtime. Models with these two characteristics should try
40
38
# Compiled Autograd, and potentially observe better performance.
41
39
#
42
-
# However, Compiled Autograd has its own limitations:
43
-
# 1. Additional runtime overhead at the start of the backward
44
-
# 2. Dynamic autograd structure leads to recompiles
40
+
# However, Compiled Autograd introduces its own limitations:
41
+
# * Added runtime overhead at the start of the backward for cache lookup
42
+
# * More prone to recompiles and graph breaks in dynamo due to the larger capture
45
43
#
46
44
# .. note:: Compiled Autograd is under active development and is not yet compatible with all existing PyTorch features. For the latest status on a particular feature, refer to `Compiled Autograd Landing Page <https://docs.google.com/document/d/11VucFBEewzqgkABIjebZIzMvrXr3BtcY1aGKpX61pJY>`_.
# Run the script with the TORCH_LOGS environment variables:
88
-
# - To only print the compiled autograd graph, use ``TORCH_LOGS="compiled_autograd" python example.py``
89
-
# - To print the graph with more tensor medata and recompile reasons, at the cost of performance, use ``TORCH_LOGS="compiled_autograd_verbose" python example.py``
84
+
# In the code above, we create an instance of the ``Model`` class and generate a random 10-dimensional tensor ``x`` by using torch.randn(10).
85
+
# We define the training loop function ``train`` and decorate it with @torch.compile to optimize its execution.
86
+
#
87
+
# When ``train(model, x)`` is called:
88
+
# * Python Interpreter calls Dynamo, since this call was decorated with ``@torch.compile``
89
+
# * Dynamo intercepts the python bytecode, simulates their execution and records the operations into a graph
90
+
# * AOTDispatcher disables hooks and calls the autograd engine to compute gradients for ``model.linear.weight`` and ``model.linear.bias``, and records the operations into a graph. Using ``torch.autograd.Function``, AOTDispatcher rewrites the forward and backward implementation of ``train``.
91
+
# * Inductor generates a function corresponding to an optimized implementation of the AOTDispatcher forward and backward
92
+
# * Dynamo sets the optimized function to be evaluated next by Python Interpreter
93
+
# * Python Interpreter executes the optimized function, which basically executes ``loss = model(x).sum()``
94
+
# * Python Interpreter executes ``loss.backward()``, calling into the autograd engine, which routes to the Compiled Autograd engine since we enabled the config: ``torch._dynamo.config.compiled_autograd = True``
95
+
# * Compiled Autograd computes the gradients for ``model.linear.weight`` and ``model.linear.bias``, and records the operations into a graph, including any hooks it encounters. During this, it will record the backward previously rewritten by AOTDispatcher. Compiled Autograd then generates a new function which corresponds to a fully traced implementation of ``loss.backward()``, and executes it with ``torch.compile`` in inference mode
96
+
# * The same steps recursively apply to the Compiled Autograd graph, but this time AOTDispatcher does not need to partition this graph into a forward and backward
# The compiled autograd graph should now be logged to stderr. Certain graph nodes will have names that are prefixed by ``aot0_``,
101
-
# these correspond to the nodes previously compiled ahead of time in AOTAutograd backward graph 0 e.g. ``aot0_view_2`` corresponds to ``view_2`` of the AOT backward graph with id=0.
100
+
# Inspecting the compiled autograd logs
101
+
# -------------------------------------
102
+
# Run the script with the ``TORCH_LOGS`` environment variables:
103
+
# - To only print the compiled autograd graph, use ``TORCH_LOGS="compiled_autograd" python example.py``
104
+
# - To print the graph with more tensor metadata and recompile reasons, at the cost of performance, use ``TORCH_LOGS="compiled_autograd_verbose" python example.py``
105
+
#
106
+
# Rerun the snippet above, the compiled autograd graph should now be logged to ``stderr``. Certain graph nodes will have names that are prefixed by ``aot0_``,
107
+
# these correspond to the nodes previously compiled ahead of time in AOTAutograd backward graph 0, for example, ``aot0_view_2`` corresponds to ``view_2`` of the AOT backward graph with id=0.
# .. note:: This is the graph that we will call torch.compile on, NOT the optimized graph. Compiled Autograd generates some python code to represent the entire C++ autograd execution.
165
+
# .. note:: This is the graph on which we will call ``torch.compile``, **NOT** the optimized graph. Compiled Autograd essentially generates some unoptimized Python code to represent the entire C++ autograd execution.
# You can use different compiler configs for the two compilations, for example, the backward may be a fullgraph even if there are graph breaks in the forward.
# In this tutorial, we went over the high-level ecosystem of torch.compile with compiled autograd, the basics of compiled autograd and a few common recompilation reasons.
306
-
#
307
-
# For feedback on this tutorial, please file an issue on https://github.com/pytorch/tutorials.
313
+
# In this tutorial, we went over the high-level ecosystem of ``torch.compile`` with compiled autograd, the basics of compiled autograd and a few common recompilation reasons.
0 commit comments