Additional cleanup and updates

fonnesbeck · fonnesbeck · commit 586b009844d7 · 2024-09-11T17:06:07.000-05:00
diff --git a/.gitignore b/.gitignore
@@ -8,3 +8,6 @@ build
 jupyter_execute
 _thumbnails
 examples/gallery.rst
+# pixi environments
+.pixi
+*.egg-info
diff --git a/examples/diagnostics_and_criticism/Diagnosing_biased_Inference_with_Divergences.ipynb b/examples/diagnostics_and_criticism/Diagnosing_biased_Inference_with_Divergences.ipynb
@@ -192,7 +192,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "7e9cf2c6b77f443bbbca89fd732fd1da",
+       "model_id": "47b5f627590f4f09a279e866818a785b",
        "version_major": 2,
        "version_minor": 0
       },
@@ -673,7 +673,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "fcf669609bf243cc86618b004ccbf703",
+       "model_id": "00ce56b9f1cf44baa87a6d127cffa01b",
        "version_major": 2,
        "version_minor": 0
       },
@@ -1043,14 +1043,16 @@
     "\n",
     "To resolve this potential ambiguity we can adjust the step size, $\\epsilon$, of the Hamiltonian transition. The smaller the step size the more accurate the trajectory and the less likely it will be mislabeled as a divergence. In other words, if we have geometric ergodicity between the Hamiltonian transition and the target distribution then decreasing the step size will reduce and then ultimately remove the divergences entirely. If we do not have geometric ergodicity, however, then decreasing the step size will not completely remove the divergences.\n",
     "\n",
-    "Like `Stan`, the step size in `PyMC` is tuned automatically during warm up, but we can coerce smaller step sizes by tweaking the configuration of `PyMC`'s adaptation routine. In particular, we can increase the `target_accept` parameter from its default value of 0.8 closer to its maximum value of 1."
+    "In `PyMC` we do not control the step size directly, but we can coerce smaller step sizes by tweaking the configuration of `PyMC`'s adaptation routine. In particular, we can increase the `target_accept` parameter from its default value of 0.8 closer to its maximum value of 1."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Adjusting Adaptation Routine"
+    "### Adjusting Adaptation Routine\n",
+    "\n",
+    "To evaluate the effect of decreasing step size (increasing `target_accept`) we can run the same model across a range of `target_accept` values."
    ]
   },
   {
@@ -1066,28 +1068,28 @@
       "Initializing NUTS using jitter+adapt_diag...\n",
       "Multiprocess sampling (2 chains in 2 jobs)\n",
       "NUTS: [mu, tau, theta]\n",
-      "Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 5 seconds.\n",
+      "Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 6 seconds.\n",
       "There were 214 divergences after tuning. Increase `target_accept` or reparameterize.\n",
       "We recommend running at least 4 chains for robust computation of convergence diagnostics\n",
       "Auto-assigning NUTS sampler...\n",
       "Initializing NUTS using jitter+adapt_diag...\n",
       "Multiprocess sampling (2 chains in 2 jobs)\n",
       "NUTS: [mu, tau, theta]\n",
-      "Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 8 seconds.\n",
+      "Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 9 seconds.\n",
       "There were 197 divergences after tuning. Increase `target_accept` or reparameterize.\n",
       "We recommend running at least 4 chains for robust computation of convergence diagnostics\n",
       "Auto-assigning NUTS sampler...\n",
       "Initializing NUTS using jitter+adapt_diag...\n",
       "Multiprocess sampling (2 chains in 2 jobs)\n",
       "NUTS: [mu, tau, theta]\n",
-      "Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 13 seconds.\n",
+      "Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 14 seconds.\n",
       "There were 129 divergences after tuning. Increase `target_accept` or reparameterize.\n",
       "We recommend running at least 4 chains for robust computation of convergence diagnostics\n",
       "Auto-assigning NUTS sampler...\n",
       "Initializing NUTS using jitter+adapt_diag...\n",
       "Multiprocess sampling (2 chains in 2 jobs)\n",
       "NUTS: [mu, tau, theta]\n",
-      "Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 25 seconds.\n",
+      "Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 27 seconds.\n",
       "There were 18 divergences after tuning. Increase `target_accept` or reparameterize.\n",
       "We recommend running at least 4 chains for robust computation of convergence diagnostics\n"
      ]
@@ -1111,26 +1113,6 @@
    "cell_type": "code",
    "execution_count": 17,
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "189"
-      ]
-     },
-     "execution_count": 17,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "longer_trace.sample_stats[\"diverging\"].sum().item()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -1202,7 +1184,7 @@
        "4   0.036504         18          .99"
       ]
      },
-     "execution_count": 18,
+     "execution_count": 17,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -1239,14 +1221,12 @@
     "\n",
     "This behavior also has a nice geometric intuition. The more we decrease the step size the more the Hamiltonian Markov chain can explore the neck of the funnel. Consequently, the marginal posterior distribution for $log (\\tau)$ stretches further and further towards negative values with the decreasing step size. \n",
     "\n",
-    "Since in `PyMC` after tuning we have a smaller step size than `Stan`, the geometery is better explored.\n",
-    "\n",
-    "However, the Hamiltonian transition is still not geometrically ergodic with respect to the centered implementation of the Eight Schools model. Indeed, this is expected given the observed bias."
+    "The Hamiltonian transition is still not geometrically ergodic with respect to the centered implementation of the Eight Schools model, as evidenced by the observed bias."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": 18,
    "metadata": {},
    "outputs": [
     {
@@ -1278,7 +1258,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": 19,
    "metadata": {},
    "outputs": [
     {
@@ -1333,7 +1313,7 @@
     "$$\\tilde{\\theta}_{n} \\sim \\mathcal{N}(0, 1)$$\n",
     "$$\\theta_{n} = \\mu + \\tau \\cdot \\tilde{\\theta}_{n}.$$\n",
     "\n",
-    "Stan model:\n",
+    "In Stan, this is specified as:\n",
     "\n",
     "```C\n",
     "data {\n",
@@ -1360,12 +1340,14 @@
     "  theta_tilde ~ normal(0, 1);\n",
     "  y ~ normal(theta, sigma);\n",
     "}\n",
-    "```"
+    "```\n",
+    "\n",
+    "Here is the corresponding `PyMC` model:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 21,
+   "execution_count": 20,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1381,7 +1363,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 22,
+   "execution_count": 21,
    "metadata": {},
    "outputs": [
     {
@@ -1397,7 +1379,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "25be11a0d9654db68c8587169e5e2ab4",
+       "model_id": "9b2f351550ff440eac0da8fcfeff8d05",
        "version_major": 2,
        "version_minor": 0
       },
@@ -1435,7 +1417,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 22,
    "metadata": {},
    "outputs": [
     {
@@ -1733,7 +1715,7 @@
        "theta_t[7]    7091.0    1.0  "
       ]
      },
-     "execution_count": 23,
+     "execution_count": 22,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -1746,12 +1728,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As shown above, the effective sample size per iteration has drastically improved, and the trace plots no longer show any \"stickyness\". However, we do still see the rare divergence. These infrequent divergences do not seem concentrate anywhere in parameter space, which is indicative of the divergences being false positives."
+    "Notice that the effective sample size per iteration has drastically improved, and the trace plots demonstrate relatively homogeneous exploration. However, we do still see the rare divergence. These infrequent divergences do not seem concentrate anywhere in parameter space, which is indicative of the divergences being false positives."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 24,
+   "execution_count": 23,
    "metadata": {},
    "outputs": [
     {
@@ -1816,12 +1798,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As expected of false positives, we can remove the divergences entirely by decreasing the step size."
+    "As expected of false positives, we can remove the divergences almost entirely by decreasing the step size."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 25,
+   "execution_count": 24,
    "metadata": {},
    "outputs": [
     {
@@ -1837,7 +1819,7 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "9f2e99d6c14743a39e2c7645e94fbcac",
+       "model_id": "9ccf9f10055746508099d9aac64a8b7f",
        "version_major": 2,
        "version_minor": 0
       },
@@ -1893,7 +1875,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": 25,
    "metadata": {},
    "outputs": [
     {
@@ -1926,7 +1908,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": 26,
    "metadata": {},
    "outputs": [
     {
@@ -1972,12 +1954,12 @@
     "* Updated by Agustina Arroyuelo in February 2018, ([pymc#2861](https://github.com/pymc-devs/pymc/pull/2861))\n",
     "* Updated by [@CloudChaoszero](https://github.com/CloudChaoszero) in January 2021, ([pymc-examples#25](https://github.com/pymc-devs/pymc-examples/pull/25))\n",
     "* Updated Markdown and styling by @reshamas in August 2022, ([pymc-examples#402](https://github.com/pymc-devs/pymc-examples/pull/402))\n",
-    "* Updated by @fonnesbeck in August 2024\n"
+    "* Updated by @fonnesbeck in August 2024 ([pymc-examples#699](https://github.com/pymc-devs/pymc-examples/pull/699))\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": 27,
    "metadata": {},
    "outputs": [
     {
@@ -1990,11 +1972,11 @@
       "Python version       : 3.12.5\n",
       "IPython version      : 8.27.0\n",
       "\n",
-      "pymc      : 5.16.2\n",
-      "matplotlib: 3.9.2\n",
       "arviz     : 0.19.0\n",
+      "pymc      : 5.16.2\n",
       "pandas    : 2.2.2\n",
       "numpy     : 1.26.4\n",
+      "matplotlib: 3.9.2\n",
       "\n",
       "Watermark: 2.4.3\n",
       "\n"
@@ -2013,13 +1995,6 @@
     ":::{include} ../page_footer.md\n",
     ":::"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
diff --git a/examples/diagnostics_and_criticism/Diagnosing_biased_Inference_with_Divergences.myst.md b/examples/diagnostics_and_criticism/Diagnosing_biased_Inference_with_Divergences.myst.md
@@ -285,12 +285,14 @@ Algorithm implemented in `Stan` uses a heuristic to quickly identify these misbe
 
 To resolve this potential ambiguity we can adjust the step size, $\epsilon$, of the Hamiltonian transition. The smaller the step size the more accurate the trajectory and the less likely it will be mislabeled as a divergence. In other words, if we have geometric ergodicity between the Hamiltonian transition and the target distribution then decreasing the step size will reduce and then ultimately remove the divergences entirely. If we do not have geometric ergodicity, however, then decreasing the step size will not completely remove the divergences.
 
-Like `Stan`, the step size in `PyMC` is tuned automatically during warm up, but we can coerce smaller step sizes by tweaking the configuration of `PyMC`'s adaptation routine. In particular, we can increase the `target_accept` parameter from its default value of 0.8 closer to its maximum value of 1.
+In `PyMC` we do not control the step size directly, but we can coerce smaller step sizes by tweaking the configuration of `PyMC`'s adaptation routine. In particular, we can increase the `target_accept` parameter from its default value of 0.8 closer to its maximum value of 1.
 
 +++
 
 ### Adjusting Adaptation Routine
 
+To evaluate the effect of decreasing step size (increasing `target_accept`) we can run the same model across a range of `target_accept` values.
+
 ```{code-cell} ipython3
 acceptance_runs = dict()
 for target_accept in [0.85, 0.90, 0.95, 0.99]:
@@ -305,10 +307,6 @@ for target_accept in [0.85, 0.90, 0.95, 0.99]:
         )
 ```
 
-```{code-cell} ipython3
-longer_trace.sample_stats["diverging"].sum().item()
-```
-
 ```{code-cell} ipython3
 df = pd.DataFrame(
     [
@@ -337,9 +335,7 @@ Here, the number of divergent transitions dropped dramatically when delta was in
 
 This behavior also has a nice geometric intuition. The more we decrease the step size the more the Hamiltonian Markov chain can explore the neck of the funnel. Consequently, the marginal posterior distribution for $log (\tau)$ stretches further and further towards negative values with the decreasing step size. 
 
-Since in `PyMC` after tuning we have a smaller step size than `Stan`, the geometery is better explored.
-
-However, the Hamiltonian transition is still not geometrically ergodic with respect to the centered implementation of the Eight Schools model. Indeed, this is expected given the observed bias.
+The Hamiltonian transition is still not geometrically ergodic with respect to the centered implementation of the Eight Schools model, as evidenced by the observed bias.
 
 ```{code-cell} ipython3
 _, ax = plt.subplots(1, 1, figsize=(10, 6))
@@ -384,7 +380,7 @@ $$\tau \sim \text{Half-Cauchy}(0, 5)$$
 $$\tilde{\theta}_{n} \sim \mathcal{N}(0, 1)$$
 $$\theta_{n} = \mu + \tau \cdot \tilde{\theta}_{n}.$$
 
-Stan model:
+In Stan, this is specified as:
 
 ```C
 data {
@@ -413,6 +409,8 @@ model {
 }
 ```
 
+Here is the corresponding `PyMC` model:
+
 ```{code-cell} ipython3
 def non_centered_eight_model():
     with pm.Model() as NonCentered_eight:
@@ -433,13 +431,13 @@ with non_centered_eight_model():
 az.summary(fit_ncp80).round(2)
 ```
 
-As shown above, the effective sample size per iteration has drastically improved, and the trace plots no longer show any "stickyness". However, we do still see the rare divergence. These infrequent divergences do not seem concentrate anywhere in parameter space, which is indicative of the divergences being false positives.
+Notice that the effective sample size per iteration has drastically improved, and the trace plots demonstrate relatively homogeneous exploration. However, we do still see the rare divergence. These infrequent divergences do not seem concentrate anywhere in parameter space, which is indicative of the divergences being false positives.
 
 ```{code-cell} ipython3
 report_trace(fit_ncp80)
 ```
 
-As expected of false positives, we can remove the divergences entirely by decreasing the step size.
+As expected of false positives, we can remove the divergences almost entirely by decreasing the step size.
 
 ```{code-cell} ipython3
 with non_centered_eight_model():
@@ -487,7 +485,7 @@ plt.legend();
 * Updated by Agustina Arroyuelo in February 2018, ([pymc#2861](https://github.com/pymc-devs/pymc/pull/2861))
 * Updated by [@CloudChaoszero](https://github.com/CloudChaoszero) in January 2021, ([pymc-examples#25](https://github.com/pymc-devs/pymc-examples/pull/25))
 * Updated Markdown and styling by @reshamas in August 2022, ([pymc-examples#402](https://github.com/pymc-devs/pymc-examples/pull/402))
-* Updated by @fonnesbeck in August 2024
+* Updated by @fonnesbeck in August 2024 ([pymc-examples#699](https://github.com/pymc-devs/pymc-examples/pull/699))
 
 ```{code-cell} ipython3
 %load_ext watermark
@@ -496,7 +494,3 @@ plt.legend();
 
 :::{include} ../page_footer.md
 :::
-
-```{code-cell} ipython3
-
-```