@@ -10,23 +10,22 @@ permalink: /automatic_differentiation
10
10
11
11
## Automatic differentiation
12
12
13
- Automatic Differentiation (AD) is a general and powerful technique
14
- of computing partial derivatives (or the complete gradient) of a function inputted as a
15
- computer program.
13
+ Automatic Differentiation (AD) is a general and powerful technique for
14
+ computing partial derivatives (or the complete gradient) of a function
15
+ inputted as a computer program.
16
16
17
- Automatic Differentiation takes advantage of the fact that any computation can
18
- be represented as a composition of simple operations / functions - this is
19
- generally represented in a graphical format and referred to as the [ compuation
20
- graph] ( https://colah.github.io/posts/2015-08-Backprop/ ) . AD works by repeated
21
- application of chain rule over this graph.
17
+ It takes advantage of the fact that any computation can be represented as a
18
+ composition of simple operations / functions - this is generally represented
19
+ in a graphical format and referred to as the [ computation
20
+ graph] ( https://colah.github.io/posts/2015-08-Backprop/ ) . AD works by
21
+ repeatedly applying the chain rule over this graph.
22
22
23
23
### Understanding Differentiation in Computing
24
24
25
25
Efficient computation of gradients is a crucial requirement in the fields of
26
- scientific computing and machine learning, where approaches like
27
- [ Gradient Descent] ( https://en.wikipedia.org/wiki/Gradient_descent )
28
- are used to iteratively converge over the optimum parameters of a mathematical
29
- model.
26
+ scientific computing and machine learning, where approaches like [ Gradient
27
+ Descent] ( https://en.wikipedia.org/wiki/Gradient_descent ) are used to
28
+ iteratively converge over the optimum parameters of a mathematical model.
30
29
31
30
Within the context of computing, there are various methods for
32
31
differentiation:
@@ -36,35 +35,42 @@ differentiation:
36
35
tedious and error-prone, especially for complex functions.
37
36
38
37
- ** Numerical Differentiation** : This method approximates the derivatives
39
- using finite differences. It is relatively simple to implement, but can
38
+ using finite differences. It is relatively simple to implement but can
40
39
suffer from numerical instability and inaccuracy in its results. It doesn't
41
- scale well with the number of inputs of the function.
40
+ scale well with the number of inputs in the function.
42
41
43
42
- ** Symbolic Differentiation** : This approach uses symbolic manipulation to
44
43
compute derivatives analytically. It provides accurate results but can lead to
45
- lengthy expressions for large computations. It requires the computer program to
46
- be representable in a closed form mathematical expression, and thus doesn't work
47
- well with control flow scenarios (if conditions and loops) in the program.
48
-
49
- - ** Automatic Differentiation (AD)** : Automatic Differentiation is a general and
50
- efficient technique that works by repeated application of chain rule over the
51
- computation graph of the program. Given its composable nature, it can easily scale
52
- for computing gradients over a very large number of inputs.
44
+ lengthy expressions for large computations. It requires the computer program
45
+ to be representable in a closed-form mathematical expression, and thus doesn't
46
+ work well with control flow scenarios (if conditions and loops) in the
47
+ program.
48
+
49
+ - ** Automatic Differentiation (AD)** : Automatic Differentiation is a general
50
+ and efficient technique that works by repeated application of the chain rule
51
+ over the computation graph of the program. Given its composable nature, it
52
+ can easily scale for computing gradients over a very large number of inputs.
53
53
54
54
### Forward and Reverse mode AD
55
- Automatic Differentiation works by applying chain rule and merging the derivatives
56
- at each node of the computation graph. The direction of this graph traversal and
57
- derivative accumulation results in two modes of operation:
58
-
59
- - Forward Mode: starts at an input to the graph and moves towards all the output nodes.
60
- For every node, it sums all the paths feeding in. By adding them up, we get the total
61
- way in which the node is affected by the input. Hence, it calculates derivatives of output(s)
62
- with respect to a single input variable.
55
+ Automatic Differentiation works by applying the chain rule and merging the
56
+ derivatives at each node of the computation graph. The direction of this graph
57
+ traversal and derivative accumulation results in two modes of operation:
58
+
59
+ - Forward Mode: starts at an input to the graph and moves towards all the
60
+ output nodes. For every node, it adds up all the paths feeding in. By adding
61
+ them up, we get the total way in which the node is affected by the input.
62
+ Hence, it calculates derivatives of output(s) with respect to a single input
63
+ variable.
64
+
65
+ ![ Forward Mode] ( /images/ForwardAccumulationAutomaticDifferentiation.png )
63
66
64
- - Reverse Mode: starts at the output node of graph and moves backward towards all
65
- the input nodes. For every node, it merges all paths which originated at that node.
66
- It tracks how every node affects one output. Hence, it calculates derivative of a single
67
- output with respect to all inputs simultaneously - the gradient.
67
+ - Reverse Mode: starts at the output node of the graph and moves backward
68
+ towards all the input nodes. For every node, it merges all paths that
69
+ originated at that node. It tracks how every node affects one output. Hence,
70
+ it calculates the derivative of a single output with respect to all inputs
71
+ simultaneously - the gradient.
72
+
73
+ ![ Reverse Mode] ( /images/ReverseAccumulationAutomaticDifferentiation.png )
68
74
69
75
### Automatic Differentiation in C++
70
76
@@ -76,10 +82,11 @@ compile time.
76
82
77
83
[ The source code transformation approach] enables optimization by retaining
78
84
all the complex knowledge of the original source code. The compute graph is
79
- constructed during compilation and then transformed to generate the derivative
80
- code. It typically uses a custom parser to build code representation and produce
81
- the transformed code. It is difficult to implement (especially in C++), but it is
82
- very efficient, since many computations and optimizations are done ahead of time.
85
+ constructed during compilation and then transformed to generate the derivative
86
+ code. It typically uses a custom parser to build code representation and
87
+ produce the transformed code. It is difficult to implement (especially in
88
+ C++), but it is very efficient, since many computations and optimizations are
89
+ done ahead of time.
83
90
84
91
### Advantages of using Automatic Differentiation
85
92
0 commit comments