@@ -10,23 +10,22 @@ permalink: /automatic_differentiation
1010
1111## Automatic differentiation
1212
13- Automatic Differentiation (AD) is a general and powerful technique
14- of computing partial derivatives (or the complete gradient) of a function inputted as a
15- computer program.
13+ Automatic Differentiation (AD) is a general and powerful technique for
14+ computing partial derivatives (or the complete gradient) of a function
15+ inputted as a computer program.
1616
17- Automatic Differentiation takes advantage of the fact that any computation can
18- be represented as a composition of simple operations / functions - this is
19- generally represented in a graphical format and referred to as the [ compuation
20- graph] ( https://colah.github.io/posts/2015-08-Backprop/ ) . AD works by repeated
21- application of chain rule over this graph.
17+ It takes advantage of the fact that any computation can be represented as a
18+ composition of simple operations / functions - this is generally represented
19+ in a graphical format and referred to as the [ computation
20+ graph] ( https://colah.github.io/posts/2015-08-Backprop/ ) . AD works by
21+ repeatedly applying the chain rule over this graph.
2222
2323### Understanding Differentiation in Computing
2424
2525Efficient computation of gradients is a crucial requirement in the fields of
26- scientific computing and machine learning, where approaches like
27- [ Gradient Descent] ( https://en.wikipedia.org/wiki/Gradient_descent )
28- are used to iteratively converge over the optimum parameters of a mathematical
29- model.
26+ scientific computing and machine learning, where approaches like [ Gradient
27+ Descent] ( https://en.wikipedia.org/wiki/Gradient_descent ) are used to
28+ iteratively converge over the optimum parameters of a mathematical model.
3029
3130Within the context of computing, there are various methods for
3231differentiation:
@@ -36,35 +35,42 @@ differentiation:
3635 tedious and error-prone, especially for complex functions.
3736
3837- ** Numerical Differentiation** : This method approximates the derivatives
39- using finite differences. It is relatively simple to implement, but can
38+ using finite differences. It is relatively simple to implement but can
4039 suffer from numerical instability and inaccuracy in its results. It doesn't
41- scale well with the number of inputs of the function.
40+ scale well with the number of inputs in the function.
4241
4342- ** Symbolic Differentiation** : This approach uses symbolic manipulation to
4443compute derivatives analytically. It provides accurate results but can lead to
45- lengthy expressions for large computations. It requires the computer program to
46- be representable in a closed form mathematical expression, and thus doesn't work
47- well with control flow scenarios (if conditions and loops) in the program.
48-
49- - ** Automatic Differentiation (AD)** : Automatic Differentiation is a general and
50- efficient technique that works by repeated application of chain rule over the
51- computation graph of the program. Given its composable nature, it can easily scale
52- for computing gradients over a very large number of inputs.
44+ lengthy expressions for large computations. It requires the computer program
45+ to be representable in a closed-form mathematical expression, and thus doesn't
46+ work well with control flow scenarios (if conditions and loops) in the
47+ program.
48+
49+ - ** Automatic Differentiation (AD)** : Automatic Differentiation is a general
50+ and efficient technique that works by repeated application of the chain rule
51+ over the computation graph of the program. Given its composable nature, it
52+ can easily scale for computing gradients over a very large number of inputs.
5353
5454### Forward and Reverse mode AD
55- Automatic Differentiation works by applying chain rule and merging the derivatives
56- at each node of the computation graph. The direction of this graph traversal and
57- derivative accumulation results in two modes of operation:
58-
59- - Forward Mode: starts at an input to the graph and moves towards all the output nodes.
60- For every node, it sums all the paths feeding in. By adding them up, we get the total
61- way in which the node is affected by the input. Hence, it calculates derivatives of output(s)
62- with respect to a single input variable.
55+ Automatic Differentiation works by applying the chain rule and merging the
56+ derivatives at each node of the computation graph. The direction of this graph
57+ traversal and derivative accumulation results in two modes of operation:
58+
59+ - Forward Mode: starts at an input to the graph and moves towards all the
60+ output nodes. For every node, it adds up all the paths feeding in. By adding
61+ them up, we get the total way in which the node is affected by the input.
62+ Hence, it calculates derivatives of output(s) with respect to a single input
63+ variable.
64+
65+ ![ Forward Mode] ( /images/ForwardAccumulationAutomaticDifferentiation.png )
6366
64- - Reverse Mode: starts at the output node of graph and moves backward towards all
65- the input nodes. For every node, it merges all paths which originated at that node.
66- It tracks how every node affects one output. Hence, it calculates derivative of a single
67- output with respect to all inputs simultaneously - the gradient.
67+ - Reverse Mode: starts at the output node of the graph and moves backward
68+ towards all the input nodes. For every node, it merges all paths that
69+ originated at that node. It tracks how every node affects one output. Hence,
70+ it calculates the derivative of a single output with respect to all inputs
71+ simultaneously - the gradient.
72+
73+ ![ Reverse Mode] ( /images/ReverseAccumulationAutomaticDifferentiation.png )
6874
6975### Automatic Differentiation in C++
7076
@@ -76,10 +82,11 @@ compile time.
7682
7783[ The source code transformation approach] enables optimization by retaining
7884all the complex knowledge of the original source code. The compute graph is
79- constructed during compilation and then transformed to generate the derivative
80- code. It typically uses a custom parser to build code representation and produce
81- the transformed code. It is difficult to implement (especially in C++), but it is
82- very efficient, since many computations and optimizations are done ahead of time.
85+ constructed during compilation and then transformed to generate the derivative
86+ code. It typically uses a custom parser to build code representation and
87+ produce the transformed code. It is difficult to implement (especially in
88+ C++), but it is very efficient, since many computations and optimizations are
89+ done ahead of time.
8390
8491### Advantages of using Automatic Differentiation
8592
0 commit comments