Skip to content

Commit 2968e9b

Browse files
committed
Updates to narrative text
1 parent 949cedc commit 2968e9b

File tree

2 files changed

+48
-18
lines changed

2 files changed

+48
-18
lines changed

examples/case_studies/bayesian_workflow.ipynb

Lines changed: 24 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,11 @@
2424
":author: Thomas Wiecki, Chris Fonnesbeck\n",
2525
":::\n",
2626
"\n",
27-
"Bayesian inference is a powerful tool for extracting inference from data using probability models. This involves an interplay among statistical models, subject matter knowledge, and computational techniques. In building Bayesian models, it is easy to get carried away with complex models at the outset, often leading to an unsatisfactory final result. To avoid these pitfalls, a structured approach is essential. The Bayesian workflow is a systematic approach to building, validating, and refining probabilistic models, ensuring that the models are robust, interpretable, and useful for decision-making. The workflow's iterative nature ensures that modeling assumptions are tested and refined as the model grows, leading to more reliable and interpretable results.\n",
27+
"Bayesian inference is a powerful tool for extracting inference from data using probability models. This involves an interplay among statistical models, subject matter knowledge, and computational techniques. In building Bayesian models, it is easy to get carried away with complex models from the outset, often leading to an unsatisfactory final result (or a dead end). To avoid common model development pitfalls, a structured approach is helpful. The *Bayesian workflow* (Gelman *et al.*) is a systematic approach to building, validating, and refining probabilistic models, ensuring that the models are robust, interpretable, and useful for decision-making. The workflow's iterative nature ensures that modeling assumptions are tested and refined as the model grows, leading to more reliable results.\n",
2828
"\n",
29-
"This workflow is particularly powerful in high-level probabilistic programming environments like PyMC, where the flexibility to rapidly prototype and iterate on complex statistical models enables practitioners to focus on the modeling process rather than the underlying computational details. The workflow invlolves moving from simple models via prior checks, fitting, diagnostics, and refinement through to a final product that satisfies the analytic goals, ensuring that computational and conceptual issues are identified and addressed systematically as they are encountered.\n",
29+
"This workflow is particularly powerful in high-level probabilistic programming environments like PyMC, where the ability to rapidly prototype and iterate on complex statistical models enables practitioners to focus on the modeling process rather than the underlying computational details. The workflow invlolves moving from simple models--via prior checks, fitting, diagnostics, and refinement--through to a final product that satisfies the analytic goals, making sure that computational and conceptual issues are identified and addressed systematically as they are encountered.\n",
3030
"\n",
31-
"Below we demonstrate the complete Bayesian workflow using COVID-19 case data, showing how to progress from basic exponential growth models to more sophisticated logistic growth formulations, highlighting the critical role of model checking and validation at each step. The model is not intended to be a state-of-the-art epidemiological model, but rather a demonstration of how to iterate from a simple model to a more complex one."
31+
"Below we demonstrate the Bayesian workflow using COVID-19 case data, showing how to progress from very basic, unrealistic models to more sophisticated formulations, highlighting the critical role of model checking and validation at each step. Here we are not looking to develop a state-of-the-art epidemiological model, but rather to demonstrate how to iterate from a simple model to a more complex one."
3232
]
3333
},
3434
{
@@ -173,7 +173,7 @@
173173
"3. Run prior predictive check\n",
174174
"4. Fit model\n",
175175
"5. Assess convergence\n",
176-
"6. Run posterior predictive check\n",
176+
"6. Check model fit\n",
177177
"7. Improve model\n",
178178
"\n",
179179
"### 1. Plot the data\n",
@@ -20632,9 +20632,16 @@
2063220632
"cell_type": "markdown",
2063320633
"metadata": {},
2063420634
"source": [
20635-
"### 6. Run posterior predictive check\n",
20635+
"### 6. Check model fit\n",
2063620636
"\n",
20637-
"Similar to the prior predictive, we can also generate new data by repeatedly taking samples from the posterior and generating data using these parameters."
20637+
"Similar to the prior predictive, we can also generate new data by repeatedly taking samples from the posterior and generating data using these parameters. This process is called **posterior predictive checking** and is a crucial step in Bayesian model validation.\n",
20638+
"\n",
20639+
"Posterior predictive checking works by:\n",
20640+
"1. Taking parameter samples from the posterior distribution (which we already have from MCMC sampling)\n",
20641+
"2. For each set of parameter values, generating new synthetic datasets using the same likelihood function as our model\n",
20642+
"3. Comparing these synthetic datasets to our observed data\n",
20643+
"\n",
20644+
"This allows us to assess whether our model can reproduce key features of the observed data. If the posterior predictive samples look very different from our actual data, it suggests our model may be missing important aspects of the data-generating process. Conversely, if the posterior predictive samples encompass our observed data well, it provides evidence that our model is capturing the essential patterns in the data."
2063820645
]
2063920646
},
2064020647
{
@@ -26180,7 +26187,9 @@
2618026187
"cell_type": "markdown",
2618126188
"metadata": {},
2618226189
"source": [
26183-
"OK, that does not look terrible, the data is at least inside of what the model can produce. Let's look at residuals for systematic errors:"
26190+
"OK, that does not look terrible; the data essentially behaves like a random draw from the model.\n",
26191+
"\n",
26192+
"As an additional check, we can also inspect the model residuals."
2618426193
]
2618526194
},
2618626195
{
@@ -31639,9 +31648,15 @@
3163931648
"source": [
3164031649
"### Prediction and forecasting\n",
3164131650
"\n",
31642-
"We might also be interested in predicting on unseen or data, or, in the case time-series data like here, in forecasting. In `PyMC` you can do so easily using `pm.Data` nodes. What it allows you to do is define data to a PyMC model that you can later switch out for other data. That way, when you for example do posterior predictive sampling, it will generate samples into the future.\n",
31651+
"We are often interested in predicting or forecasting. In PyMC, you can do so easily using `pm.Data` nodes, which provide a powerful mechanism for out-of-sample prediction and forecasting.\n",
31652+
"\n",
31653+
"Wrapping your input data in `pm.Data` allows you to define data containers within a PyMC model that can be dynamically updated after model fitting. This is particularly useful for prediction scenarios where you want to:\n",
31654+
"\n",
31655+
"1. **Train on observed data**: Fit your model using the available training data\n",
31656+
"2. **Switch to prediction inputs**: Replace the training data with new input values (e.g., future time points)\n",
31657+
"3. **Generate predictions**: Use posterior predictive sampling to generate forecasts based on the fitted model\n",
3164331658
"\n",
31644-
"Let's change our model to use `pm.Data` instead."
31659+
"Let's demonstrate this approach by modifying our exponential growth model to use `pm.Data` nodes."
3164531660
]
3164631661
},
3164731662
{

examples/case_studies/bayesian_workflow.myst.md

Lines changed: 24 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,11 @@ kernelspec:
2222
:author: Thomas Wiecki, Chris Fonnesbeck
2323
:::
2424

25-
Bayesian inference is a powerful tool for extracting inference from data using probability models. This involves an interplay among statistical models, subject matter knowledge, and computational techniques. In building Bayesian models, it is easy to get carried away with complex models at the outset, often leading to an unsatisfactory final result. To avoid these pitfalls, a structured approach is essential. The Bayesian workflow is a systematic approach to building, validating, and refining probabilistic models, ensuring that the models are robust, interpretable, and useful for decision-making. The workflow's iterative nature ensures that modeling assumptions are tested and refined as the model grows, leading to more reliable and interpretable results.
25+
Bayesian inference is a powerful tool for extracting inference from data using probability models. This involves an interplay among statistical models, subject matter knowledge, and computational techniques. In building Bayesian models, it is easy to get carried away with complex models from the outset, often leading to an unsatisfactory final result (or a dead end). To avoid common model development pitfalls, a structured approach is helpful. The *Bayesian workflow* (Gelman *et al.*) is a systematic approach to building, validating, and refining probabilistic models, ensuring that the models are robust, interpretable, and useful for decision-making. The workflow's iterative nature ensures that modeling assumptions are tested and refined as the model grows, leading to more reliable results.
2626

27-
This workflow is particularly powerful in high-level probabilistic programming environments like PyMC, where the flexibility to rapidly prototype and iterate on complex statistical models enables practitioners to focus on the modeling process rather than the underlying computational details. The workflow invlolves moving from simple models via prior checks, fitting, diagnostics, and refinement through to a final product that satisfies the analytic goals, ensuring that computational and conceptual issues are identified and addressed systematically as they are encountered.
27+
This workflow is particularly powerful in high-level probabilistic programming environments like PyMC, where the ability to rapidly prototype and iterate on complex statistical models enables practitioners to focus on the modeling process rather than the underlying computational details. The workflow invlolves moving from simple models--via prior checks, fitting, diagnostics, and refinement--through to a final product that satisfies the analytic goals, making sure that computational and conceptual issues are identified and addressed systematically as they are encountered.
2828

29-
Below we demonstrate the complete Bayesian workflow using COVID-19 case data, showing how to progress from basic exponential growth models to more sophisticated logistic growth formulations, highlighting the critical role of model checking and validation at each step. The model is not intended to be a state-of-the-art epidemiological model, but rather a demonstration of how to iterate from a simple model to a more complex one.
29+
Below we demonstrate the Bayesian workflow using COVID-19 case data, showing how to progress from very basic, unrealistic models to more sophisticated formulations, highlighting the critical role of model checking and validation at each step. Here we are not looking to develop a state-of-the-art epidemiological model, but rather to demonstrate how to iterate from a simple model to a more complex one.
3030

3131
```{code-cell} ipython3
3232
---
@@ -97,7 +97,7 @@ Next, we will start developing a model of the spread. These models will start ou
9797
3. Run prior predictive check
9898
4. Fit model
9999
5. Assess convergence
100-
6. Run posterior predictive check
100+
6. Check model fit
101101
7. Improve model
102102

103103
### 1. Plot the data
@@ -404,9 +404,16 @@ It seems like bounding the priors did not result in better fit. This is not unex
404404

405405
+++
406406

407-
### 6. Run posterior predictive check
407+
### 6. Check model fit
408408

409-
Similar to the prior predictive, we can also generate new data by repeatedly taking samples from the posterior and generating data using these parameters.
409+
Similar to the prior predictive, we can also generate new data by repeatedly taking samples from the posterior and generating data using these parameters. This process is called **posterior predictive checking** and is a crucial step in Bayesian model validation.
410+
411+
Posterior predictive checking works by:
412+
1. Taking parameter samples from the posterior distribution (which we already have from MCMC sampling)
413+
2. For each set of parameter values, generating new synthetic datasets using the same likelihood function as our model
414+
3. Comparing these synthetic datasets to our observed data
415+
416+
This allows us to assess whether our model can reproduce key features of the observed data. If the posterior predictive samples look very different from our actual data, it suggests our model may be missing important aspects of the data-generating process. Conversely, if the posterior predictive samples encompass our observed data well, it provides evidence that our model is capturing the essential patterns in the data.
410417

411418
```{code-cell} ipython3
412419
with model_exp3:
@@ -453,7 +460,9 @@ fig.update_layout(
453460
fig.show()
454461
```
455462

456-
OK, that does not look terrible, the data is at least inside of what the model can produce. Let's look at residuals for systematic errors:
463+
OK, that does not look terrible; the data essentially behaves like a random draw from the model.
464+
465+
As an additional check, we can also inspect the model residuals.
457466

458467
```{code-cell} ipython3
459468
resid = post_pred.posterior_predictive["obs"].sel(chain=0) - confirmed_values
@@ -488,9 +497,15 @@ What can you see?
488497

489498
### Prediction and forecasting
490499

491-
We might also be interested in predicting on unseen or data, or, in the case time-series data like here, in forecasting. In `PyMC` you can do so easily using `pm.Data` nodes. What it allows you to do is define data to a PyMC model that you can later switch out for other data. That way, when you for example do posterior predictive sampling, it will generate samples into the future.
500+
We are often interested in predicting or forecasting. In PyMC, you can do so easily using `pm.Data` nodes, which provide a powerful mechanism for out-of-sample prediction and forecasting.
501+
502+
Wrapping your input data in `pm.Data` allows you to define data containers within a PyMC model that can be dynamically updated after model fitting. This is particularly useful for prediction scenarios where you want to:
503+
504+
1. **Train on observed data**: Fit your model using the available training data
505+
2. **Switch to prediction inputs**: Replace the training data with new input values (e.g., future time points)
506+
3. **Generate predictions**: Use posterior predictive sampling to generate forecasts based on the fitted model
492507

493-
Let's change our model to use `pm.Data` instead.
508+
Let's demonstrate this approach by modifying our exponential growth model to use `pm.Data` nodes.
494509

495510
```{code-cell} ipython3
496511
with pm.Model() as model_exp4:

0 commit comments

Comments
 (0)