Skip to content

Commit 3794da9

Browse files
committed
Added images and ran through grammar tool
1 parent 050fcdb commit 3794da9

3 files changed

+45
-38
lines changed

_pages/automatic_differentiation.md

+45-38
Original file line numberDiff line numberDiff line change
@@ -10,23 +10,22 @@ permalink: /automatic_differentiation
1010

1111
## Automatic differentiation
1212

13-
Automatic Differentiation (AD) is a general and powerful technique
14-
of computing partial derivatives (or the complete gradient) of a function inputted as a
15-
computer program.
13+
Automatic Differentiation (AD) is a general and powerful technique for
14+
computing partial derivatives (or the complete gradient) of a function
15+
inputted as a computer program.
1616

17-
Automatic Differentiation takes advantage of the fact that any computation can
18-
be represented as a composition of simple operations / functions - this is
19-
generally represented in a graphical format and referred to as the [compuation
20-
graph](https://colah.github.io/posts/2015-08-Backprop/). AD works by repeated
21-
application of chain rule over this graph.
17+
It takes advantage of the fact that any computation can be represented as a
18+
composition of simple operations / functions - this is generally represented
19+
in a graphical format and referred to as the [computation
20+
graph](https://colah.github.io/posts/2015-08-Backprop/). AD works by
21+
repeatedly applying the chain rule over this graph.
2222

2323
### Understanding Differentiation in Computing
2424

2525
Efficient computation of gradients is a crucial requirement in the fields of
26-
scientific computing and machine learning, where approaches like
27-
[Gradient Descent](https://en.wikipedia.org/wiki/Gradient_descent)
28-
are used to iteratively converge over the optimum parameters of a mathematical
29-
model.
26+
scientific computing and machine learning, where approaches like [Gradient
27+
Descent](https://en.wikipedia.org/wiki/Gradient_descent) are used to
28+
iteratively converge over the optimum parameters of a mathematical model.
3029

3130
Within the context of computing, there are various methods for
3231
differentiation:
@@ -36,35 +35,42 @@ differentiation:
3635
tedious and error-prone, especially for complex functions.
3736

3837
- **Numerical Differentiation**: This method approximates the derivatives
39-
using finite differences. It is relatively simple to implement, but can
38+
using finite differences. It is relatively simple to implement but can
4039
suffer from numerical instability and inaccuracy in its results. It doesn't
41-
scale well with the number of inputs of the function.
40+
scale well with the number of inputs in the function.
4241

4342
- **Symbolic Differentiation**: This approach uses symbolic manipulation to
4443
compute derivatives analytically. It provides accurate results but can lead to
45-
lengthy expressions for large computations. It requires the computer program to
46-
be representable in a closed form mathematical expression, and thus doesn't work
47-
well with control flow scenarios (if conditions and loops) in the program.
48-
49-
- **Automatic Differentiation (AD)**: Automatic Differentiation is a general and
50-
efficient technique that works by repeated application of chain rule over the
51-
computation graph of the program. Given its composable nature, it can easily scale
52-
for computing gradients over a very large number of inputs.
44+
lengthy expressions for large computations. It requires the computer program
45+
to be representable in a closed-form mathematical expression, and thus doesn't
46+
work well with control flow scenarios (if conditions and loops) in the
47+
program.
48+
49+
- **Automatic Differentiation (AD)**: Automatic Differentiation is a general
50+
and efficient technique that works by repeated application of the chain rule
51+
over the computation graph of the program. Given its composable nature, it
52+
can easily scale for computing gradients over a very large number of inputs.
5353

5454
### Forward and Reverse mode AD
55-
Automatic Differentiation works by applying chain rule and merging the derivatives
56-
at each node of the computation graph. The direction of this graph traversal and
57-
derivative accumulation results in two modes of operation:
58-
59-
- Forward Mode: starts at an input to the graph and moves towards all the output nodes.
60-
For every node, it sums all the paths feeding in. By adding them up, we get the total
61-
way in which the node is affected by the input. Hence, it calculates derivatives of output(s)
62-
with respect to a single input variable.
55+
Automatic Differentiation works by applying the chain rule and merging the
56+
derivatives at each node of the computation graph. The direction of this graph
57+
traversal and derivative accumulation results in two modes of operation:
58+
59+
- Forward Mode: starts at an input to the graph and moves towards all the
60+
output nodes. For every node, it adds up all the paths feeding in. By adding
61+
them up, we get the total way in which the node is affected by the input.
62+
Hence, it calculates derivatives of output(s) with respect to a single input
63+
variable.
64+
65+
![Forward Mode](/images/ForwardAccumulationAutomaticDifferentiation.png)
6366

64-
- Reverse Mode: starts at the output node of graph and moves backward towards all
65-
the input nodes. For every node, it merges all paths which originated at that node.
66-
It tracks how every node affects one output. Hence, it calculates derivative of a single
67-
output with respect to all inputs simultaneously - the gradient.
67+
- Reverse Mode: starts at the output node of the graph and moves backward
68+
towards all the input nodes. For every node, it merges all paths that
69+
originated at that node. It tracks how every node affects one output. Hence,
70+
it calculates the derivative of a single output with respect to all inputs
71+
simultaneously - the gradient.
72+
73+
![Reverse Mode](/images/ReverseAccumulationAutomaticDifferentiation.png)
6874

6975
### Automatic Differentiation in C++
7076

@@ -76,10 +82,11 @@ compile time.
7682

7783
[The source code transformation approach] enables optimization by retaining
7884
all the complex knowledge of the original source code. The compute graph is
79-
constructed during compilation and then transformed to generate the derivative
80-
code. It typically uses a custom parser to build code representation and produce
81-
the transformed code. It is difficult to implement (especially in C++), but it is
82-
very efficient, since many computations and optimizations are done ahead of time.
85+
constructed during compilation and then transformed to generate the derivative
86+
code. It typically uses a custom parser to build code representation and
87+
produce the transformed code. It is difficult to implement (especially in
88+
C++), but it is very efficient, since many computations and optimizations are
89+
done ahead of time.
8390

8491
### Advantages of using Automatic Differentiation
8592

Loading
Loading

0 commit comments

Comments
 (0)