|
242 | 242 | <div class="pytorch-left-menu-search">
|
243 | 243 |
|
244 | 244 | <div class="version">
|
245 |
| - <a href='https://pytorch.org/docs/versions.html'>main (2.2.0a0+git7bd0042 ) ▼</a> |
| 245 | + <a href='https://pytorch.org/docs/versions.html'>main (2.2.0a0+git2b2b6ca ) ▼</a> |
246 | 246 | </div>
|
247 | 247 |
|
248 | 248 |
|
@@ -998,15 +998,15 @@ <h1>Source code for torch.ao.nn.quantized.dynamic.modules.rnn</h1><div class="hi
|
998 | 998 | <span class="sd"> \begin{array}{ll}</span>
|
999 | 999 | <span class="sd"> r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\</span>
|
1000 | 1000 | <span class="sd"> z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\</span>
|
1001 |
| -<span class="sd"> n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\</span> |
1002 |
| -<span class="sd"> h_t = (1 - z_t) * n_t + z_t * h_{(t-1)}</span> |
| 1001 | +<span class="sd"> n_t = \tanh(W_{in} x_t + b_{in} + r_t \odot (W_{hn} h_{(t-1)}+ b_{hn})) \\</span> |
| 1002 | +<span class="sd"> h_t = (1 - z_t) \odot n_t + z_t \odot h_{(t-1)}</span> |
1003 | 1003 | <span class="sd"> \end{array}</span>
|
1004 | 1004 |
|
1005 | 1005 | <span class="sd"> where :math:`h_t` is the hidden state at time `t`, :math:`x_t` is the input</span>
|
1006 | 1006 | <span class="sd"> at time `t`, :math:`h_{(t-1)}` is the hidden state of the layer</span>
|
1007 | 1007 | <span class="sd"> at time `t-1` or the initial hidden state at time `0`, and :math:`r_t`,</span>
|
1008 | 1008 | <span class="sd"> :math:`z_t`, :math:`n_t` are the reset, update, and new gates, respectively.</span>
|
1009 |
| -<span class="sd"> :math:`\sigma` is the sigmoid function, and :math:`*` is the Hadamard product.</span> |
| 1009 | +<span class="sd"> :math:`\sigma` is the sigmoid function, and :math:`\odot` is the Hadamard product.</span> |
1010 | 1010 |
|
1011 | 1011 | <span class="sd"> In a multilayer GRU, the input :math:`x^{(l)}_t` of the :math:`l` -th layer</span>
|
1012 | 1012 | <span class="sd"> (:math:`l >= 2`) is the hidden state :math:`h^{(l-1)}_t` of the previous layer multiplied by</span>
|
@@ -1084,20 +1084,20 @@ <h1>Source code for torch.ao.nn.quantized.dynamic.modules.rnn</h1><div class="hi
|
1084 | 1084 |
|
1085 | 1085 | <span class="sd"> .. note::</span>
|
1086 | 1086 | <span class="sd"> The calculation of new gate :math:`n_t` subtly differs from the original paper and other frameworks.</span>
|
1087 |
| -<span class="sd"> In the original implementation, the Hadamard product :math:`(*)` between :math:`r_t` and the</span> |
| 1087 | +<span class="sd"> In the original implementation, the Hadamard product :math:`(\odot)` between :math:`r_t` and the</span> |
1088 | 1088 | <span class="sd"> previous hidden state :math:`h_{(t-1)}` is done before the multiplication with the weight matrix</span>
|
1089 | 1089 | <span class="sd"> `W` and addition of bias:</span>
|
1090 | 1090 |
|
1091 | 1091 | <span class="sd"> .. math::</span>
|
1092 | 1092 | <span class="sd"> \begin{aligned}</span>
|
1093 |
| -<span class="sd"> n_t = \tanh(W_{in} x_t + b_{in} + W_{hn} ( r_t * h_{(t-1)} ) + b_{hn})</span> |
| 1093 | +<span class="sd"> n_t = \tanh(W_{in} x_t + b_{in} + W_{hn} ( r_t \odot h_{(t-1)} ) + b_{hn})</span> |
1094 | 1094 | <span class="sd"> \end{aligned}</span>
|
1095 | 1095 |
|
1096 | 1096 | <span class="sd"> This is in contrast to PyTorch implementation, which is done after :math:`W_{hn} h_{(t-1)}`</span>
|
1097 | 1097 |
|
1098 | 1098 | <span class="sd"> .. math::</span>
|
1099 | 1099 | <span class="sd"> \begin{aligned}</span>
|
1100 |
| -<span class="sd"> n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn}))</span> |
| 1100 | +<span class="sd"> n_t = \tanh(W_{in} x_t + b_{in} + r_t \odot (W_{hn} h_{(t-1)}+ b_{hn}))</span> |
1101 | 1101 | <span class="sd"> \end{aligned}</span>
|
1102 | 1102 |
|
1103 | 1103 | <span class="sd"> This implementation differs on purpose for efficiency.</span>
|
|
0 commit comments