You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The :math:`\frac{\partial L}{\partial\hat{y}}` term can be calculated automatically using the ``pytorch.autograd`` capabilities.
44
44
However, because we've decoupled the single-pose model predictions from the overall multi-pose prediction, we must manually account for the relation between the :math:`\frac{\partial\hat{y}}{\partial\theta}` term and the individual gradients that we calculated during the forward pass (:math:`\frac{\partial\hat{y}_i}{\partial\theta}`).
@@ -50,3 +50,123 @@ Arbitrarily, this will be some function (:math:`g`) that depends on the individu
In practice, this function :math:`g` will need to be analytically determined and manually implemented within the ``Combination`` block (see :ref:`the guide <new-combination-guide>` for more practical information).
53
+
54
+
.. _implemented-combs:
55
+
56
+
Math for Implemented Combinations
57
+
----------------------------------
58
+
59
+
Below, we detail the math required for appropriately combining gradients.
60
+
This math is used in the ``backward`` pass in the various ``Combination`` classes.
61
+
62
+
.. _imp-comb-loss-fn:
63
+
64
+
Loss Functions
65
+
^^^^^^^^^^^^^^
66
+
67
+
We anticipate these ``Combination`` methods being used with a linear combination of two types of loss functions:
68
+
69
+
* Loss based on the final combined prediction (ie :math:`L = f(\Delta\text{G} (\theta))`)
70
+
71
+
* Loss based on a linear combination of the per-pose predictions (ie :math:`L = f(\Delta\text{G}_1 (\theta), \Delta\text{G}_2 (\theta), ...)`)
72
+
73
+
Ultimately for backprop we need to return the gradients of the loss wrt each model parameter.
74
+
The gradients for each of these types of losses is given below.
75
+
76
+
Combined Prediction
77
+
"""""""""""""""""""
78
+
79
+
.. math::
80
+
:label: comb-grad
81
+
82
+
\frac{\partial L}{\partial\theta} =
83
+
\frac{\partial L}{\partial\Delta\text{G}}
84
+
\frac{\partial\Delta\text{G}}{\partial\theta}
85
+
86
+
The :math:`\frac{\partial L}{\partial\Delta\text{G}}` part of this equation will be a scalar that is calculated automatically by ``pytorch`` and fed to our ``Combination`` class.
87
+
The :math:`\frac{\partial\Delta\text{G}}{\partial\theta}` parts will be computed internally.
88
+
89
+
Per-Pose Prediction
90
+
"""""""""""""""""""
91
+
92
+
Because we assume this loss is based on a linear combination of the individual :math:`\Delta\text{G}_i` predictions, we can decompose the loss as:
93
+
94
+
.. math::
95
+
:label: pose-grad
96
+
97
+
\frac{\partial L}{\partial\theta} =
98
+
\sum_{i=1}^N
99
+
\frac{\partial L}{\partial\Delta\text{G}_i}
100
+
\frac{\partial\Delta\text{G}_i}{\partial\theta}
101
+
102
+
As before, the :math:`\frac{\partial L}{\partial\Delta\text{G}_i}` parts of this equation will be scalars calculated automatically by ``pytorch`` and fed to our ``Combination`` class, and the :math:`\frac{\partial\Delta\text{G}}{\partial\theta}` parts will be computed internally.
103
+
104
+
.. _mean-comb-imp:
105
+
106
+
Mean Combination
107
+
^^^^^^^^^^^^^^^^
108
+
109
+
This is mostly included as an example, but it can be illustrative.
This will likely be the more useful of the currently implemented ``Combination`` implementations.
127
+
In the below equations, we define the following variables:
128
+
129
+
* :math:`n` : A sign multiplier taking the value of :math:`-1` if we are taking the min value (generally the case if the inputs are :math:`\Delta\text{G}` values) or :math:`1` if we are taking the max
130
+
* :math:`t` : A scaling value that will bring the final combined value closer to the actual value of the max/min of the input values (see `here <https://en.wikipedia.org/wiki/LogSumExp#Properties>`_ for more details).
131
+
Setting :math:`t = 1` reduces this operation to the LogSumExp operation
132
+
133
+
.. math::
134
+
:label: max-comb-pred
135
+
136
+
\Delta\text{G}(\theta) = n \frac{1}{t} \text{ln} \sum_{i=1}^N \text{exp} (n t \Delta\text{G}_i (\theta))
137
+
138
+
We define a a constant :math:`Q` for simplicity as well as for numerical stability:
139
+
140
+
.. math::
141
+
:label: max-comb-q
142
+
143
+
Q = \text{ln} \sum_{i=1}^N \text{exp} (n t \Delta\text{G}_i (\theta))
0 commit comments