Skip to content

Commit 586b009

Browse files
committed
Additional cleanup and updates
1 parent 1b6cbb8 commit 586b009

File tree

3 files changed

+47
-75
lines changed

3 files changed

+47
-75
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,6 @@ build
88
jupyter_execute
99
_thumbnails
1010
examples/gallery.rst
11+
# pixi environments
12+
.pixi
13+
*.egg-info

examples/diagnostics_and_criticism/Diagnosing_biased_Inference_with_Divergences.ipynb

Lines changed: 34 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@
192192
{
193193
"data": {
194194
"application/vnd.jupyter.widget-view+json": {
195-
"model_id": "7e9cf2c6b77f443bbbca89fd732fd1da",
195+
"model_id": "47b5f627590f4f09a279e866818a785b",
196196
"version_major": 2,
197197
"version_minor": 0
198198
},
@@ -673,7 +673,7 @@
673673
{
674674
"data": {
675675
"application/vnd.jupyter.widget-view+json": {
676-
"model_id": "fcf669609bf243cc86618b004ccbf703",
676+
"model_id": "00ce56b9f1cf44baa87a6d127cffa01b",
677677
"version_major": 2,
678678
"version_minor": 0
679679
},
@@ -1043,14 +1043,16 @@
10431043
"\n",
10441044
"To resolve this potential ambiguity we can adjust the step size, $\\epsilon$, of the Hamiltonian transition. The smaller the step size the more accurate the trajectory and the less likely it will be mislabeled as a divergence. In other words, if we have geometric ergodicity between the Hamiltonian transition and the target distribution then decreasing the step size will reduce and then ultimately remove the divergences entirely. If we do not have geometric ergodicity, however, then decreasing the step size will not completely remove the divergences.\n",
10451045
"\n",
1046-
"Like `Stan`, the step size in `PyMC` is tuned automatically during warm up, but we can coerce smaller step sizes by tweaking the configuration of `PyMC`'s adaptation routine. In particular, we can increase the `target_accept` parameter from its default value of 0.8 closer to its maximum value of 1."
1046+
"In `PyMC` we do not control the step size directly, but we can coerce smaller step sizes by tweaking the configuration of `PyMC`'s adaptation routine. In particular, we can increase the `target_accept` parameter from its default value of 0.8 closer to its maximum value of 1."
10471047
]
10481048
},
10491049
{
10501050
"cell_type": "markdown",
10511051
"metadata": {},
10521052
"source": [
1053-
"### Adjusting Adaptation Routine"
1053+
"### Adjusting Adaptation Routine\n",
1054+
"\n",
1055+
"To evaluate the effect of decreasing step size (increasing `target_accept`) we can run the same model across a range of `target_accept` values."
10541056
]
10551057
},
10561058
{
@@ -1066,28 +1068,28 @@
10661068
"Initializing NUTS using jitter+adapt_diag...\n",
10671069
"Multiprocess sampling (2 chains in 2 jobs)\n",
10681070
"NUTS: [mu, tau, theta]\n",
1069-
"Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 5 seconds.\n",
1071+
"Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 6 seconds.\n",
10701072
"There were 214 divergences after tuning. Increase `target_accept` or reparameterize.\n",
10711073
"We recommend running at least 4 chains for robust computation of convergence diagnostics\n",
10721074
"Auto-assigning NUTS sampler...\n",
10731075
"Initializing NUTS using jitter+adapt_diag...\n",
10741076
"Multiprocess sampling (2 chains in 2 jobs)\n",
10751077
"NUTS: [mu, tau, theta]\n",
1076-
"Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 8 seconds.\n",
1078+
"Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 9 seconds.\n",
10771079
"There were 197 divergences after tuning. Increase `target_accept` or reparameterize.\n",
10781080
"We recommend running at least 4 chains for robust computation of convergence diagnostics\n",
10791081
"Auto-assigning NUTS sampler...\n",
10801082
"Initializing NUTS using jitter+adapt_diag...\n",
10811083
"Multiprocess sampling (2 chains in 2 jobs)\n",
10821084
"NUTS: [mu, tau, theta]\n",
1083-
"Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 13 seconds.\n",
1085+
"Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 14 seconds.\n",
10841086
"There were 129 divergences after tuning. Increase `target_accept` or reparameterize.\n",
10851087
"We recommend running at least 4 chains for robust computation of convergence diagnostics\n",
10861088
"Auto-assigning NUTS sampler...\n",
10871089
"Initializing NUTS using jitter+adapt_diag...\n",
10881090
"Multiprocess sampling (2 chains in 2 jobs)\n",
10891091
"NUTS: [mu, tau, theta]\n",
1090-
"Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 25 seconds.\n",
1092+
"Sampling 2 chains for 2_000 tune and 5_000 draw iterations (4_000 + 10_000 draws total) took 27 seconds.\n",
10911093
"There were 18 divergences after tuning. Increase `target_accept` or reparameterize.\n",
10921094
"We recommend running at least 4 chains for robust computation of convergence diagnostics\n"
10931095
]
@@ -1111,26 +1113,6 @@
11111113
"cell_type": "code",
11121114
"execution_count": 17,
11131115
"metadata": {},
1114-
"outputs": [
1115-
{
1116-
"data": {
1117-
"text/plain": [
1118-
"189"
1119-
]
1120-
},
1121-
"execution_count": 17,
1122-
"metadata": {},
1123-
"output_type": "execute_result"
1124-
}
1125-
],
1126-
"source": [
1127-
"longer_trace.sample_stats[\"diverging\"].sum().item()"
1128-
]
1129-
},
1130-
{
1131-
"cell_type": "code",
1132-
"execution_count": 18,
1133-
"metadata": {},
11341116
"outputs": [
11351117
{
11361118
"data": {
@@ -1202,7 +1184,7 @@
12021184
"4 0.036504 18 .99"
12031185
]
12041186
},
1205-
"execution_count": 18,
1187+
"execution_count": 17,
12061188
"metadata": {},
12071189
"output_type": "execute_result"
12081190
}
@@ -1239,14 +1221,12 @@
12391221
"\n",
12401222
"This behavior also has a nice geometric intuition. The more we decrease the step size the more the Hamiltonian Markov chain can explore the neck of the funnel. Consequently, the marginal posterior distribution for $log (\\tau)$ stretches further and further towards negative values with the decreasing step size. \n",
12411223
"\n",
1242-
"Since in `PyMC` after tuning we have a smaller step size than `Stan`, the geometery is better explored.\n",
1243-
"\n",
1244-
"However, the Hamiltonian transition is still not geometrically ergodic with respect to the centered implementation of the Eight Schools model. Indeed, this is expected given the observed bias."
1224+
"The Hamiltonian transition is still not geometrically ergodic with respect to the centered implementation of the Eight Schools model, as evidenced by the observed bias."
12451225
]
12461226
},
12471227
{
12481228
"cell_type": "code",
1249-
"execution_count": 19,
1229+
"execution_count": 18,
12501230
"metadata": {},
12511231
"outputs": [
12521232
{
@@ -1278,7 +1258,7 @@
12781258
},
12791259
{
12801260
"cell_type": "code",
1281-
"execution_count": 20,
1261+
"execution_count": 19,
12821262
"metadata": {},
12831263
"outputs": [
12841264
{
@@ -1333,7 +1313,7 @@
13331313
"$$\\tilde{\\theta}_{n} \\sim \\mathcal{N}(0, 1)$$\n",
13341314
"$$\\theta_{n} = \\mu + \\tau \\cdot \\tilde{\\theta}_{n}.$$\n",
13351315
"\n",
1336-
"Stan model:\n",
1316+
"In Stan, this is specified as:\n",
13371317
"\n",
13381318
"```C\n",
13391319
"data {\n",
@@ -1360,12 +1340,14 @@
13601340
" theta_tilde ~ normal(0, 1);\n",
13611341
" y ~ normal(theta, sigma);\n",
13621342
"}\n",
1363-
"```"
1343+
"```\n",
1344+
"\n",
1345+
"Here is the corresponding `PyMC` model:"
13641346
]
13651347
},
13661348
{
13671349
"cell_type": "code",
1368-
"execution_count": 21,
1350+
"execution_count": 20,
13691351
"metadata": {},
13701352
"outputs": [],
13711353
"source": [
@@ -1381,7 +1363,7 @@
13811363
},
13821364
{
13831365
"cell_type": "code",
1384-
"execution_count": 22,
1366+
"execution_count": 21,
13851367
"metadata": {},
13861368
"outputs": [
13871369
{
@@ -1397,7 +1379,7 @@
13971379
{
13981380
"data": {
13991381
"application/vnd.jupyter.widget-view+json": {
1400-
"model_id": "25be11a0d9654db68c8587169e5e2ab4",
1382+
"model_id": "9b2f351550ff440eac0da8fcfeff8d05",
14011383
"version_major": 2,
14021384
"version_minor": 0
14031385
},
@@ -1435,7 +1417,7 @@
14351417
},
14361418
{
14371419
"cell_type": "code",
1438-
"execution_count": 23,
1420+
"execution_count": 22,
14391421
"metadata": {},
14401422
"outputs": [
14411423
{
@@ -1733,7 +1715,7 @@
17331715
"theta_t[7] 7091.0 1.0 "
17341716
]
17351717
},
1736-
"execution_count": 23,
1718+
"execution_count": 22,
17371719
"metadata": {},
17381720
"output_type": "execute_result"
17391721
}
@@ -1746,12 +1728,12 @@
17461728
"cell_type": "markdown",
17471729
"metadata": {},
17481730
"source": [
1749-
"As shown above, the effective sample size per iteration has drastically improved, and the trace plots no longer show any \"stickyness\". However, we do still see the rare divergence. These infrequent divergences do not seem concentrate anywhere in parameter space, which is indicative of the divergences being false positives."
1731+
"Notice that the effective sample size per iteration has drastically improved, and the trace plots demonstrate relatively homogeneous exploration. However, we do still see the rare divergence. These infrequent divergences do not seem concentrate anywhere in parameter space, which is indicative of the divergences being false positives."
17501732
]
17511733
},
17521734
{
17531735
"cell_type": "code",
1754-
"execution_count": 24,
1736+
"execution_count": 23,
17551737
"metadata": {},
17561738
"outputs": [
17571739
{
@@ -1816,12 +1798,12 @@
18161798
"cell_type": "markdown",
18171799
"metadata": {},
18181800
"source": [
1819-
"As expected of false positives, we can remove the divergences entirely by decreasing the step size."
1801+
"As expected of false positives, we can remove the divergences almost entirely by decreasing the step size."
18201802
]
18211803
},
18221804
{
18231805
"cell_type": "code",
1824-
"execution_count": 25,
1806+
"execution_count": 24,
18251807
"metadata": {},
18261808
"outputs": [
18271809
{
@@ -1837,7 +1819,7 @@
18371819
{
18381820
"data": {
18391821
"application/vnd.jupyter.widget-view+json": {
1840-
"model_id": "9f2e99d6c14743a39e2c7645e94fbcac",
1822+
"model_id": "9ccf9f10055746508099d9aac64a8b7f",
18411823
"version_major": 2,
18421824
"version_minor": 0
18431825
},
@@ -1893,7 +1875,7 @@
18931875
},
18941876
{
18951877
"cell_type": "code",
1896-
"execution_count": 26,
1878+
"execution_count": 25,
18971879
"metadata": {},
18981880
"outputs": [
18991881
{
@@ -1926,7 +1908,7 @@
19261908
},
19271909
{
19281910
"cell_type": "code",
1929-
"execution_count": 27,
1911+
"execution_count": 26,
19301912
"metadata": {},
19311913
"outputs": [
19321914
{
@@ -1972,12 +1954,12 @@
19721954
"* Updated by Agustina Arroyuelo in February 2018, ([pymc#2861](https://github.com/pymc-devs/pymc/pull/2861))\n",
19731955
"* Updated by [@CloudChaoszero](https://github.com/CloudChaoszero) in January 2021, ([pymc-examples#25](https://github.com/pymc-devs/pymc-examples/pull/25))\n",
19741956
"* Updated Markdown and styling by @reshamas in August 2022, ([pymc-examples#402](https://github.com/pymc-devs/pymc-examples/pull/402))\n",
1975-
"* Updated by @fonnesbeck in August 2024\n"
1957+
"* Updated by @fonnesbeck in August 2024 ([pymc-examples#699](https://github.com/pymc-devs/pymc-examples/pull/699))\n"
19761958
]
19771959
},
19781960
{
19791961
"cell_type": "code",
1980-
"execution_count": 28,
1962+
"execution_count": 27,
19811963
"metadata": {},
19821964
"outputs": [
19831965
{
@@ -1990,11 +1972,11 @@
19901972
"Python version : 3.12.5\n",
19911973
"IPython version : 8.27.0\n",
19921974
"\n",
1993-
"pymc : 5.16.2\n",
1994-
"matplotlib: 3.9.2\n",
19951975
"arviz : 0.19.0\n",
1976+
"pymc : 5.16.2\n",
19961977
"pandas : 2.2.2\n",
19971978
"numpy : 1.26.4\n",
1979+
"matplotlib: 3.9.2\n",
19981980
"\n",
19991981
"Watermark: 2.4.3\n",
20001982
"\n"
@@ -2013,13 +1995,6 @@
20131995
":::{include} ../page_footer.md\n",
20141996
":::"
20151997
]
2016-
},
2017-
{
2018-
"cell_type": "code",
2019-
"execution_count": null,
2020-
"metadata": {},
2021-
"outputs": [],
2022-
"source": []
20231998
}
20241999
],
20252000
"metadata": {

examples/diagnostics_and_criticism/Diagnosing_biased_Inference_with_Divergences.myst.md

Lines changed: 10 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -285,12 +285,14 @@ Algorithm implemented in `Stan` uses a heuristic to quickly identify these misbe
285285

286286
To resolve this potential ambiguity we can adjust the step size, $\epsilon$, of the Hamiltonian transition. The smaller the step size the more accurate the trajectory and the less likely it will be mislabeled as a divergence. In other words, if we have geometric ergodicity between the Hamiltonian transition and the target distribution then decreasing the step size will reduce and then ultimately remove the divergences entirely. If we do not have geometric ergodicity, however, then decreasing the step size will not completely remove the divergences.
287287

288-
Like `Stan`, the step size in `PyMC` is tuned automatically during warm up, but we can coerce smaller step sizes by tweaking the configuration of `PyMC`'s adaptation routine. In particular, we can increase the `target_accept` parameter from its default value of 0.8 closer to its maximum value of 1.
288+
In `PyMC` we do not control the step size directly, but we can coerce smaller step sizes by tweaking the configuration of `PyMC`'s adaptation routine. In particular, we can increase the `target_accept` parameter from its default value of 0.8 closer to its maximum value of 1.
289289

290290
+++
291291

292292
### Adjusting Adaptation Routine
293293

294+
To evaluate the effect of decreasing step size (increasing `target_accept`) we can run the same model across a range of `target_accept` values.
295+
294296
```{code-cell} ipython3
295297
acceptance_runs = dict()
296298
for target_accept in [0.85, 0.90, 0.95, 0.99]:
@@ -305,10 +307,6 @@ for target_accept in [0.85, 0.90, 0.95, 0.99]:
305307
)
306308
```
307309

308-
```{code-cell} ipython3
309-
longer_trace.sample_stats["diverging"].sum().item()
310-
```
311-
312310
```{code-cell} ipython3
313311
df = pd.DataFrame(
314312
[
@@ -337,9 +335,7 @@ Here, the number of divergent transitions dropped dramatically when delta was in
337335

338336
This behavior also has a nice geometric intuition. The more we decrease the step size the more the Hamiltonian Markov chain can explore the neck of the funnel. Consequently, the marginal posterior distribution for $log (\tau)$ stretches further and further towards negative values with the decreasing step size.
339337

340-
Since in `PyMC` after tuning we have a smaller step size than `Stan`, the geometery is better explored.
341-
342-
However, the Hamiltonian transition is still not geometrically ergodic with respect to the centered implementation of the Eight Schools model. Indeed, this is expected given the observed bias.
338+
The Hamiltonian transition is still not geometrically ergodic with respect to the centered implementation of the Eight Schools model, as evidenced by the observed bias.
343339

344340
```{code-cell} ipython3
345341
_, ax = plt.subplots(1, 1, figsize=(10, 6))
@@ -384,7 +380,7 @@ $$\tau \sim \text{Half-Cauchy}(0, 5)$$
384380
$$\tilde{\theta}_{n} \sim \mathcal{N}(0, 1)$$
385381
$$\theta_{n} = \mu + \tau \cdot \tilde{\theta}_{n}.$$
386382

387-
Stan model:
383+
In Stan, this is specified as:
388384

389385
```C
390386
data {
@@ -413,6 +409,8 @@ model {
413409
}
414410
```
415411

412+
Here is the corresponding `PyMC` model:
413+
416414
```{code-cell} ipython3
417415
def non_centered_eight_model():
418416
with pm.Model() as NonCentered_eight:
@@ -433,13 +431,13 @@ with non_centered_eight_model():
433431
az.summary(fit_ncp80).round(2)
434432
```
435433

436-
As shown above, the effective sample size per iteration has drastically improved, and the trace plots no longer show any "stickyness". However, we do still see the rare divergence. These infrequent divergences do not seem concentrate anywhere in parameter space, which is indicative of the divergences being false positives.
434+
Notice that the effective sample size per iteration has drastically improved, and the trace plots demonstrate relatively homogeneous exploration. However, we do still see the rare divergence. These infrequent divergences do not seem concentrate anywhere in parameter space, which is indicative of the divergences being false positives.
437435

438436
```{code-cell} ipython3
439437
report_trace(fit_ncp80)
440438
```
441439

442-
As expected of false positives, we can remove the divergences entirely by decreasing the step size.
440+
As expected of false positives, we can remove the divergences almost entirely by decreasing the step size.
443441

444442
```{code-cell} ipython3
445443
with non_centered_eight_model():
@@ -487,7 +485,7 @@ plt.legend();
487485
* Updated by Agustina Arroyuelo in February 2018, ([pymc#2861](https://github.com/pymc-devs/pymc/pull/2861))
488486
* Updated by [@CloudChaoszero](https://github.com/CloudChaoszero) in January 2021, ([pymc-examples#25](https://github.com/pymc-devs/pymc-examples/pull/25))
489487
* Updated Markdown and styling by @reshamas in August 2022, ([pymc-examples#402](https://github.com/pymc-devs/pymc-examples/pull/402))
490-
* Updated by @fonnesbeck in August 2024
488+
* Updated by @fonnesbeck in August 2024 ([pymc-examples#699](https://github.com/pymc-devs/pymc-examples/pull/699))
491489

492490
```{code-cell} ipython3
493491
%load_ext watermark
@@ -496,7 +494,3 @@ plt.legend();
496494

497495
:::{include} ../page_footer.md
498496
:::
499-
500-
```{code-cell} ipython3
501-
502-
```

0 commit comments

Comments
 (0)