[ca] set autograd graph task state (pytorch#143108)

xmfan · pytorchmergebot · commit ab04f3aee1d1 · 2024-12-13T03:10:48.000Z
GraphTask holds metadata needed for a single execution of backward(), it is 1:1 with backward calls, at least for compiled autograd. It is used for certain torch._C global autograd state APIs. In SAC, we use torch._C._current_graph_task_id() as a dict key to store information during unpack hook execution: https://github.com/pytorch/pytorch/blob/a5fb07af2718285a2d6406535e22fc4035ed7854/torch/utils/checkpoint.py#L1128 If we don't set an active task, it will randomize the key, and will do its logic as if each unpacked tensor was from a different graph task https://github.com/pytorch/pytorch/blob/a5fb07af2718285a2d6406535e22fc4035ed7854/torch/utils/checkpoint.py#L1112-L1115 The sketchy part of this PR is that in eager autograd, GraphTask is mutated during execution. But inspecting the struct, the mutation seems to only be used to communicate between autograd threads (created when multiple devices are involved) or for deprecated uses. We shouldn't run into the mutation case at all in compiled autograd. Also, only the graph task id is accessible from python hooks. FIXES pytorch#142862 Pull Request resolved: pytorch#143108 Approved by: https://github.com/jansel, https://github.com/albanD
diff --git a/test/inductor/test_compiled_autograd.py b/test/inductor/test_compiled_autograd.py
@@ -3322,6 +3322,57 @@ def make_post_acc_grad_hook(id):
 
         self.check_output_and_recompiles(fn)
 
+    def test_sac(self):
+        # circular import
+        from torch.utils.checkpoint import (
+            checkpoint,
+            CheckpointPolicy,
+            create_selective_checkpoint_contexts,
+        )
+
+        def fn():
+            class mlp(nn.Module):
+                def __init__(self):
+                    super().__init__()
+                    self.layer1 = nn.Linear(10, 10)
+                    self.layer2 = nn.Linear(10, 10)
+                    self.layer3 = nn.Linear(10, 10)
+                    self.layer4 = nn.Linear(10, 10)
+
+                def forward(self, x):
+                    x = self.layer1(x)
+                    x = self.layer2(x)
+                    x = self.layer3(x)
+                    x = self.layer4(x)
+                    return x
+
+            recompute_list = [torch.ops.aten.addmm.default]
+
+            def recompute_policy(ctx, op, *args, **kwargs):
+                if op in recompute_list:
+                    return CheckpointPolicy.MUST_RECOMPUTE
+                else:
+                    return CheckpointPolicy.PREFER_SAVE
+
+            def context_fn():
+                return create_selective_checkpoint_contexts(recompute_policy)
+
+            model = mlp()
+            input = torch.randn(1, 10)
+
+            out = checkpoint(model, input, use_reentrant=False, context_fn=context_fn)
+            out.sum().backward()
+            yield model.layer1.weight.grad
+            yield model.layer1.bias.grad
+            yield model.layer2.weight.grad
+            yield model.layer2.bias.grad
+            yield model.layer3.weight.grad
+            yield model.layer3.bias.grad
+            yield model.layer4.weight.grad
+            yield model.layer4.bias.grad
+
+        self.check_output_and_recompiles(fn)
+
 
 def load_test_module(name):
     testdir = Path(__file__).absolute().parent.parent
diff --git a/torch/csrc/autograd/engine.cpp b/torch/csrc/autograd/engine.cpp
@@ -1321,6 +1321,7 @@ auto Engine::execute(
     TORCH_CHECK(
         !AnomalyMode::is_enabled(),
         "compiled_autograd does not support AnomalyMode")
+    GraphTaskGuard guard(graph_task);
     return (*compiled_autograd)(
         graph_root, *graph_task, accumulate_grad, outputs);
   }

Original file line number	Diff line number	Diff line change
`@@ -1321,6 +1321,7 @@ auto Engine::execute(`
`1321`	`1321`	`TORCH_CHECK(`
`1322`	`1322`	`!AnomalyMode::is_enabled(),`
`1323`	`1323`	`"compiled_autograd does not support AnomalyMode")`
	`1324`	`+ GraphTaskGuard guard(graph_task);`
`1324`	`1325`	`return (*compiled_autograd)(`
`1325`	`1326`	`graph_root, *graph_task, accumulate_grad, outputs);`
`1326`	`1327`	`}`