1
1
---
2
2
title : " Compiler Research Research Areas"
3
3
layout : gridlay
4
- excerpt : " Automatic differentiation (AD) is a powerful technique for evaluating the
5
- derivatives of mathematical functions in C++, offering significant advantages
6
- over traditional differentiation methods ."
4
+ excerpt : " Automatic Differentiation (AD) is a general and powerful technique
5
+ of computing partial derivatives (or the complete gradient) of a function inputted as a
6
+ computer program ."
7
7
sitemap : true
8
8
permalink : /automatic_differentiation
9
9
---
10
10
11
11
## Automatic differentiation
12
12
13
- Automatic differentiation (AD) is a technique for evaluating the derivatives
14
- of mathematical functions in C++, offering a number of advantages over
15
- traditional differentiation methods. By leveraging the principles of Automatic
16
- Differentiation, programmers can efficiently calculate partial derivatives of
17
- functions, opening up a range of applications in scientific computing and
18
- machine learning.
13
+ Automatic Differentiation (AD) is a general and powerful technique
14
+ of computing partial derivatives (or the complete gradient) of a function inputted as a
15
+ computer program.
16
+
17
+ Automatic Differentiation takes advantage of the fact that any computation can
18
+ be represented as a composition of simple operations / functions - this is
19
+ generally represented in a graphical format and referred to as the [ compuation
20
+ graph] ( https://colah.github.io/posts/2015-08-Backprop/ ) . AD works by repeated
21
+ application of chain rule over this graph.
19
22
20
23
### Understanding Differentiation in Computing
21
24
22
- Differentiation in calculus is the process of finding the rate of change of
23
- one quantity with respect to another. There are several principles and
24
- formulas for differentiation, such as Sum Rule, Product Rule, Quotient Rule,
25
- Constant Rule, and Chain Rule. For Automatic Differentiation, the Chain Rule
26
- of differential calculus is of special interest .
25
+ Efficient computation of gradients is a crucial requirement in the fields of
26
+ scientific computing and machine learning, where approaches like
27
+ [ Gradient Descent ] ( https://en.wikipedia.org/wiki/Gradient_descent )
28
+ are used to iteratively converge over the optimum parameters of a mathematical
29
+ model .
27
30
28
31
Within the context of computing, there are various methods for
29
32
differentiation:
@@ -34,22 +37,34 @@ differentiation:
34
37
35
38
- ** Numerical Differentiation** : This method approximates the derivatives
36
39
using finite differences. It is relatively simple to implement, but can
37
- suffer from numerical instability and inaccuracy in its results.
40
+ suffer from numerical instability and inaccuracy in its results. It doesn't
41
+ scale well with the number of inputs of the function.
38
42
39
43
- ** Symbolic Differentiation** : This approach uses symbolic manipulation to
40
44
compute derivatives analytically. It provides accurate results but can lead to
41
- lengthy expressions for large computations. It is limited to closed-form
42
- expressions; that is, it cannot process the control flow.
43
-
44
- - ** Automatic Differentiation (AD)** : Automatic Differentiation is a highly
45
- efficient technique that computes derivatives of mathematical functions by
46
- applying differentiation rules to every arithmetic operation in the code.
47
- Automatic Differentiation can be used in two modes:
48
-
49
- - Forward Mode: calculates derivatives with respect to a single variable, and
50
-
51
- - Reverse Mode: calculates gradients with respect to all inputs
52
- simultaneously.
45
+ lengthy expressions for large computations. It requires the computer program to
46
+ be representable in a closed form mathematical expression, and thus doesn't work
47
+ well with control flow scenarios (if conditions and loops) in the program.
48
+
49
+ - ** Automatic Differentiation (AD)** : Automatic Differentiation is a general and
50
+ efficient technique that works by repeated application of chain rule over the
51
+ computation graph of the program. Given its composable nature, it can easily scale
52
+ for computing gradients over a very large number of inputs.
53
+
54
+ ### Forward and Reverse mode AD
55
+ Automatic Differentiation works by applying chain rule and merging the derivatives
56
+ at each node of the computation graph. The direction of this graph traversal and
57
+ derivative accumulation results in two modes of operation:
58
+
59
+ - Forward Mode: starts at an input to the graph and moves towards all the output nodes.
60
+ For every node, it sums all the paths feeding in. By adding them up, we get the total
61
+ way in which the node is affected by the input. Hence, it calculates derivatives of output(s)
62
+ with respect to a single input variable.
63
+
64
+ - Reverse Mode: starts at the output node of graph and moves backward towards all
65
+ the input nodes. For every node, it merges all paths which originated at that node.
66
+ It tracks how every node affects one output. Hence, it calculates derivative of a single
67
+ output with respect to all inputs simultaneously - the gradient.
53
68
54
69
### Automatic Differentiation in C++
55
70
@@ -61,10 +76,10 @@ compile time.
61
76
62
77
[ The source code transformation approach] enables optimization by retaining
63
78
all the complex knowledge of the original source code. The compute graph is
64
- constructed before compilation and then transformed and compiled. It typically
65
- uses a custom parser to build code representation and produce the transformed
66
- code. It is difficult to implement (especially in C++), but it is very
67
- efficient, since many computations and optimizations are done ahead of time.
79
+ constructed during compilation and then transformed to generate the derivative
80
+ code. It typically uses a custom parser to build code representation and produce
81
+ the transformed code. It is difficult to implement (especially in C++), but it is
82
+ very efficient, since many computations and optimizations are done ahead of time.
68
83
69
84
### Advantages of using Automatic Differentiation
70
85
@@ -76,7 +91,7 @@ efficient, since many computations and optimizations are done ahead of time.
76
91
- It can take derivatives of algorithms involving conditionals, loops, and
77
92
recursion.
78
93
79
- - It works without generating inefficiently long expressions.
94
+ - It can be easily scaled for functions with very large number of inputs.
80
95
81
96
### Automatic Differentiation Implementation with Clad - a Clang Plugin
82
97
@@ -88,7 +103,7 @@ It is implemented as a plugin for the Clang compiler.
88
103
89
104
[ Clad] operates on Clang AST (Abstract Syntax Tree) and is capable of
90
105
performing C++ Source Code Transformation. When Clad is given the C++ source
91
- code of a mathematical function, it can automatically generate C++ code for
106
+ code of a mathematical function, it can algorithmically generate C++ code for
92
107
the computing derivatives of that function. Clad has comprehensive coverage of
93
108
the latest C++ features and a well-rounded fallback and recovery system in
94
109
place.
0 commit comments