Math not rendering in docs (#3712)

ColCarroll · web-flow · commit 1bf82efc2670 · 2019-12-06T14:42:26.000-08:00
diff --git a/docs/source/notebooks/probabilistic_matrix_factorization.ipynb b/docs/source/notebooks/probabilistic_matrix_factorization.ipynb
@@ -742,21 +742,21 @@
     "\n",
     "$\\newcommand\\given[1][]{\\:#1\\vert\\:}$\n",
     "\n",
-    "\\begin{equation}\n",
+    "$$\n",
     "P(R \\given U, V, \\alpha^2) = \n",
     "    \\prod_{i=1}^N \\prod_{j=1}^M\n",
     "        \\left[ \\mathcal{N}(R_{ij} \\given U_i V_j^T, \\alpha^{-1}) \\right]^{I_{ij}}\n",
-    "\\end{equation}\n",
+    "$$\n",
     "\n",
-    "\\begin{equation}\n",
+    "$$\n",
     "P(U \\given \\alpha_U^2) =\n",
     "    \\prod_{i=1}^N \\mathcal{N}(U_i \\given 0, \\alpha_U^{-1} \\boldsymbol{I})\n",
-    "\\end{equation}\n",
+    "$$\n",
     "\n",
-    "\\begin{equation}\n",
+    "$$\n",
     "P(V \\given \\alpha_U^2) =\n",
     "    \\prod_{j=1}^M \\mathcal{N}(V_j \\given 0, \\alpha_V^{-1} \\boldsymbol{I})\n",
-    "\\end{equation}\n",
+    "$$\n",
     "\n",
     "Given small precision parameters, the priors on $U$ and $V$ ensure our latent variables do not grow too far from 0. This prevents overly strong user preferences and item factor compositions from being learned. This is commonly known as complexity control, where the complexity of the model here is measured by the magnitude of the latent variables. Controlling complexity like this helps prevent overfitting, which allows the model to generalize better for unseen data. We must also choose an appropriate $\\alpha$ value for the normal distribution for $R$. So the challenge becomes choosing appropriate values for $\\alpha_U$, $\\alpha_V$, and $\\alpha$. This challenge can be tackled with the soft weight-sharing methods discussed by [Nowland and Hinton, 1992](http://www.cs.toronto.edu/~fritz/absps/sunspots.pdf) [4]. However, for the purposes of this analysis, we will stick to using point estimates obtained from our data."
    ]
@@ -902,10 +902,10 @@
    "source": [
     "We could define some kind of default trace property like we did for the MAP, but that would mean using possibly nonsensical values for `nsamples` and `cores`. Better to leave it as a non-optional call to `draw_samples`. Finally, we'll need a function to make predictions using our inferred values for $U$ and $V$. For user $i$ and movie $j$, a prediction is generated by drawing from $\\mathcal{N}(U_i V_j^T, \\alpha)$. To generate predictions from the sampler, we generate an $R$ matrix for each $U$ and $V$ sampled, then we combine these by averaging over the $K$ samples.\n",
     "\n",
-    "\\begin{equation}\n",
+    "$$\n",
     "P(R_{ij}^* \\given R, \\alpha, \\alpha_U, \\alpha_V) \\approx\n",
     "    \\frac{1}{K} \\sum_{k=1}^K \\mathcal{N}(U_i V_j^T, \\alpha)\n",
-    "\\end{equation}\n",
+    "$$\n",
     "\n",
     "We'll want to inspect the individual $R$ matrices before averaging them for diagnostic purposes. So we'll write code for the averaging piece during evaluation. The function below simply draws an $R$ matrix given a $U$ and $V$ and the fixed $\\alpha$ stored in the PMF object."
    ]
@@ -948,10 +948,10 @@
     "\n",
     "In order to understand how effective our models are, we'll need to be able to evaluate them. We'll be evaluating in terms of root mean squared error (RMSE), which looks like this:\n",
     "\n",
-    "\\begin{equation}\n",
+    "$$\n",
     "RMSE = \\sqrt{ \\frac{ \\sum_{i=1}^N \\sum_{j=1}^M I_{ij} (R_{ij} - R_{ij}^*)^2 }\n",
     "                   { \\sum_{i=1}^N \\sum_{j=1}^M I_{ij} } }\n",
-    "\\end{equation}\n",
+    "$$\n",
     "\n",
     "In this case, the RMSE can be thought of as the standard deviation of our predictions from the actual user preferences."
    ]