SGD: remove unneeded multiply-add initialization operations (pytorch#18114)

nzmora · facebook-github-bot · commit 1c76746f6159 · 2019-03-19T10:34:17.000-07:00
Summary: The momentum buffer is initialized to the value of d_p, but the current code takes the long way to do this: 1. Create a buffer of zeros 2. Multiply the buffer by the momentum coefficient 3. Add d_p to the buffer All of these can be collapsed into a single step: 1. Create a clone of d_p Pull Request resolved: pytorch#18114 Differential Revision: D14509122 Pulled By: ezyang fbshipit-source-id: 4a79b896201d5ff20770b7ae790c244ba744edb8
diff --git a/torch/optim/sgd.py b/torch/optim/sgd.py
@@ -94,8 +94,7 @@ def step(self, closure=None):
                 if momentum != 0:
                     param_state = self.state[p]
                     if 'momentum_buffer' not in param_state:
-                        buf = param_state['momentum_buffer'] = torch.zeros_like(p.data)
-                        buf.mul_(momentum).add_(d_p)
+                        buf = param_state['momentum_buffer'] = torch.clone(d_p).detach()
                     else:
                         buf = param_state['momentum_buffer']
                         buf.mul_(momentum).add_(1 - dampening, d_p)