Updates to narrative text

fonnesbeck · fonnesbeck · commit 2968e9b6710f · 2025-06-17T17:51:57.000-05:00
diff --git a/examples/case_studies/bayesian_workflow.ipynb b/examples/case_studies/bayesian_workflow.ipynb
@@ -24,11 +24,11 @@
     ":author: Thomas Wiecki, Chris Fonnesbeck\n",
     ":::\n",
     "\n",
-    "Bayesian inference is a powerful tool for extracting inference from data using probability models. This involves an interplay among statistical models, subject matter knowledge, and computational techniques. In building Bayesian models, it is easy to get carried away with complex models at the outset, often leading to an unsatisfactory final result. To avoid these pitfalls, a structured approach is essential. The Bayesian workflow is a systematic approach to building, validating, and refining probabilistic models, ensuring that the models are robust, interpretable, and useful for decision-making. The workflow's iterative nature ensures that modeling assumptions are tested and refined as the model grows, leading to more reliable and interpretable results.\n",
+    "Bayesian inference is a powerful tool for extracting inference from data using probability models. This involves an interplay among statistical models, subject matter knowledge, and computational techniques. In building Bayesian models, it is easy to get carried away with complex models from the outset, often leading to an unsatisfactory final result (or a dead end). To avoid common model development pitfalls, a structured approach is helpful. The *Bayesian workflow* (Gelman *et al.*) is a systematic approach to building, validating, and refining probabilistic models, ensuring that the models are robust, interpretable, and useful for decision-making. The workflow's iterative nature ensures that modeling assumptions are tested and refined as the model grows, leading to more reliable results.\n",
     "\n",
-    "This workflow is particularly powerful in high-level probabilistic programming environments like PyMC, where the flexibility to rapidly prototype and iterate on complex statistical models enables practitioners to focus on the modeling process rather than the underlying computational details. The workflow invlolves moving from simple models via prior checks, fitting, diagnostics, and refinement through to a final product that satisfies the analytic goals, ensuring that computational and conceptual issues are identified and addressed systematically as they are encountered.\n",
+    "This workflow is particularly powerful in high-level probabilistic programming environments like PyMC, where the ability to rapidly prototype and iterate on complex statistical models enables practitioners to focus on the modeling process rather than the underlying computational details. The workflow invlolves moving from simple models--via prior checks, fitting, diagnostics, and refinement--through to a final product that satisfies the analytic goals, making sure that computational and conceptual issues are identified and addressed systematically as they are encountered.\n",
     "\n",
-    "Below we demonstrate the complete Bayesian workflow using COVID-19 case data, showing how to progress from basic exponential growth models to more sophisticated logistic growth formulations, highlighting the critical role of model checking and validation at each step. The model is not intended to be a state-of-the-art epidemiological model, but rather a demonstration of how to iterate from a simple model to a more complex one."
+    "Below we demonstrate the Bayesian workflow using COVID-19 case data, showing how to progress from very basic, unrealistic models to more sophisticated formulations, highlighting the critical role of model checking and validation at each step. Here we are not looking to develop a state-of-the-art epidemiological model, but rather to demonstrate how to iterate from a simple model to a more complex one."
    ]
   },
   {
@@ -173,7 +173,7 @@
     "3. Run prior predictive check\n",
     "4. Fit model\n",
     "5. Assess convergence\n",
-    "6. Run posterior predictive check\n",
+    "6. Check model fit\n",
     "7. Improve model\n",
     "\n",
     "### 1. Plot the data\n",
@@ -20632,9 +20632,16 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 6. Run posterior predictive check\n",
+    "### 6. Check model fit\n",
     "\n",
-    "Similar to the prior predictive, we can also generate new data by repeatedly taking samples from the posterior and generating data using these parameters."
+    "Similar to the prior predictive, we can also generate new data by repeatedly taking samples from the posterior and generating data using these parameters. This process is called **posterior predictive checking** and is a crucial step in Bayesian model validation.\n",
+    "\n",
+    "Posterior predictive checking works by:\n",
+    "1. Taking parameter samples from the posterior distribution (which we already have from MCMC sampling)\n",
+    "2. For each set of parameter values, generating new synthetic datasets using the same likelihood function as our model\n",
+    "3. Comparing these synthetic datasets to our observed data\n",
+    "\n",
+    "This allows us to assess whether our model can reproduce key features of the observed data. If the posterior predictive samples look very different from our actual data, it suggests our model may be missing important aspects of the data-generating process. Conversely, if the posterior predictive samples encompass our observed data well, it provides evidence that our model is capturing the essential patterns in the data."
    ]
   },
   {
@@ -26180,7 +26187,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "OK, that does not look terrible, the data is at least inside of what the model can produce. Let's look at residuals for systematic errors:"
+    "OK, that does not look terrible; the data essentially behaves like a random draw from the model.\n",
+    "\n",
+    "As an additional check, we can also inspect the model residuals."
    ]
   },
   {
@@ -31639,9 +31648,15 @@
    "source": [
     "### Prediction and forecasting\n",
     "\n",
-    "We might also be interested in predicting on unseen or data, or, in the case time-series data like here, in forecasting. In `PyMC` you can do so easily using `pm.Data` nodes. What it allows you to do is define data to a PyMC model that you can later switch out for other data. That way, when you for example do posterior predictive sampling, it will generate samples into the future.\n",
+    "We are often interested in predicting or forecasting. In PyMC, you can do so easily using `pm.Data` nodes, which provide a powerful mechanism for out-of-sample prediction and forecasting.\n",
+    "\n",
+    "Wrapping your input data in `pm.Data` allows you to define data containers within a PyMC model that can be dynamically updated after model fitting. This is particularly useful for prediction scenarios where you want to:\n",
+    "\n",
+    "1. **Train on observed data**: Fit your model using the available training data\n",
+    "2. **Switch to prediction inputs**: Replace the training data with new input values (e.g., future time points)\n",
+    "3. **Generate predictions**: Use posterior predictive sampling to generate forecasts based on the fitted model\n",
     "\n",
-    "Let's change our model to use `pm.Data` instead."
+    "Let's demonstrate this approach by modifying our exponential growth model to use `pm.Data` nodes."
    ]
   },
   {
diff --git a/examples/case_studies/bayesian_workflow.myst.md b/examples/case_studies/bayesian_workflow.myst.md
@@ -22,11 +22,11 @@ kernelspec:
 :author: Thomas Wiecki, Chris Fonnesbeck
 :::
 
-Bayesian inference is a powerful tool for extracting inference from data using probability models. This involves an interplay among statistical models, subject matter knowledge, and computational techniques. In building Bayesian models, it is easy to get carried away with complex models at the outset, often leading to an unsatisfactory final result. To avoid these pitfalls, a structured approach is essential. The Bayesian workflow is a systematic approach to building, validating, and refining probabilistic models, ensuring that the models are robust, interpretable, and useful for decision-making. The workflow's iterative nature ensures that modeling assumptions are tested and refined as the model grows, leading to more reliable and interpretable results.
+Bayesian inference is a powerful tool for extracting inference from data using probability models. This involves an interplay among statistical models, subject matter knowledge, and computational techniques. In building Bayesian models, it is easy to get carried away with complex models from the outset, often leading to an unsatisfactory final result (or a dead end). To avoid common model development pitfalls, a structured approach is helpful. The *Bayesian workflow* (Gelman *et al.*) is a systematic approach to building, validating, and refining probabilistic models, ensuring that the models are robust, interpretable, and useful for decision-making. The workflow's iterative nature ensures that modeling assumptions are tested and refined as the model grows, leading to more reliable results.
 
-This workflow is particularly powerful in high-level probabilistic programming environments like PyMC, where the flexibility to rapidly prototype and iterate on complex statistical models enables practitioners to focus on the modeling process rather than the underlying computational details. The workflow invlolves moving from simple models via prior checks, fitting, diagnostics, and refinement through to a final product that satisfies the analytic goals, ensuring that computational and conceptual issues are identified and addressed systematically as they are encountered.
+This workflow is particularly powerful in high-level probabilistic programming environments like PyMC, where the ability to rapidly prototype and iterate on complex statistical models enables practitioners to focus on the modeling process rather than the underlying computational details. The workflow invlolves moving from simple models--via prior checks, fitting, diagnostics, and refinement--through to a final product that satisfies the analytic goals, making sure that computational and conceptual issues are identified and addressed systematically as they are encountered.
 
-Below we demonstrate the complete Bayesian workflow using COVID-19 case data, showing how to progress from basic exponential growth models to more sophisticated logistic growth formulations, highlighting the critical role of model checking and validation at each step. The model is not intended to be a state-of-the-art epidemiological model, but rather a demonstration of how to iterate from a simple model to a more complex one.
+Below we demonstrate the Bayesian workflow using COVID-19 case data, showing how to progress from very basic, unrealistic models to more sophisticated formulations, highlighting the critical role of model checking and validation at each step. Here we are not looking to develop a state-of-the-art epidemiological model, but rather to demonstrate how to iterate from a simple model to a more complex one.
 
 ```{code-cell} ipython3
 ---
@@ -97,7 +97,7 @@ Next, we will start developing a model of the spread. These models will start ou
 3. Run prior predictive check
 4. Fit model
 5. Assess convergence
-6. Run posterior predictive check
+6. Check model fit
 7. Improve model
 
 ### 1. Plot the data
@@ -404,9 +404,16 @@ It seems like bounding the priors did not result in better fit. This is not unex
 
 +++
 
-### 6. Run posterior predictive check
+### 6. Check model fit
 
-Similar to the prior predictive, we can also generate new data by repeatedly taking samples from the posterior and generating data using these parameters.
+Similar to the prior predictive, we can also generate new data by repeatedly taking samples from the posterior and generating data using these parameters. This process is called **posterior predictive checking** and is a crucial step in Bayesian model validation.
+
+Posterior predictive checking works by:
+1. Taking parameter samples from the posterior distribution (which we already have from MCMC sampling)
+2. For each set of parameter values, generating new synthetic datasets using the same likelihood function as our model
+3. Comparing these synthetic datasets to our observed data
+
+This allows us to assess whether our model can reproduce key features of the observed data. If the posterior predictive samples look very different from our actual data, it suggests our model may be missing important aspects of the data-generating process. Conversely, if the posterior predictive samples encompass our observed data well, it provides evidence that our model is capturing the essential patterns in the data.
 
 ```{code-cell} ipython3
 with model_exp3:
@@ -453,7 +460,9 @@ fig.update_layout(
 fig.show()
 ```
 
-OK, that does not look terrible, the data is at least inside of what the model can produce. Let's look at residuals for systematic errors:
+OK, that does not look terrible; the data essentially behaves like a random draw from the model.
+
+As an additional check, we can also inspect the model residuals.
 
 ```{code-cell} ipython3
 resid = post_pred.posterior_predictive["obs"].sel(chain=0) - confirmed_values
@@ -488,9 +497,15 @@ What can you see?
 
 ### Prediction and forecasting
 
-We might also be interested in predicting on unseen or data, or, in the case time-series data like here, in forecasting. In `PyMC` you can do so easily using `pm.Data` nodes. What it allows you to do is define data to a PyMC model that you can later switch out for other data. That way, when you for example do posterior predictive sampling, it will generate samples into the future.
+We are often interested in predicting or forecasting. In PyMC, you can do so easily using `pm.Data` nodes, which provide a powerful mechanism for out-of-sample prediction and forecasting.
+
+Wrapping your input data in `pm.Data` allows you to define data containers within a PyMC model that can be dynamically updated after model fitting. This is particularly useful for prediction scenarios where you want to:
+
+1. **Train on observed data**: Fit your model using the available training data
+2. **Switch to prediction inputs**: Replace the training data with new input values (e.g., future time points)
+3. **Generate predictions**: Use posterior predictive sampling to generate forecasts based on the fitted model
 
-Let's change our model to use `pm.Data` instead.
+Let's demonstrate this approach by modifying our exponential growth model to use `pm.Data` nodes.
 
 ```{code-cell} ipython3
 with pm.Model() as model_exp4: