Skip to content

Latest commit

 

History

History
1146 lines (944 loc) · 31.9 KB

02_programming-probabilistically.md

File metadata and controls

1146 lines (944 loc) · 31.9 KB

Ch 2. Programming Probabilistically

In this chapter, we will cover the following topics:

  • Probabilistic programming
  • PyMC3 primer
  • The coin-flipping problem revisited
  • Summarizing the posterior
  • The Gaussian and student's t models
  • Comparing groups and the effect size
  • Hierarchical models and shrinkage
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm
import seaborn as sns
from scipy import stats

%config InlineBackend.figure_format = 'retina'

Probabilistic programming

Bayesian statistics is conceptually very simple; we have the knows and the unknowns; we use Bayes' theorem to condition the latter on the former. If we are lucky, this process will reduce the uncertainty about the unknowns. Generally, we refer to the knowns as data and treat it like a constant, and the unknowns as parameters and treat them as probability distributions.

PyMC3 primer

  • a library for probabilistic programming
  • uses NumPy and Theano
    • Theano is a deep learning algorithm that supplies the automatic differentiation required for sampling by PyMC3
    • Theano also compiles the code to C for faster execution
  • Theano is no longer developed, but the PyMC devs are currently maintaining it
    • the next version of PyMC will use a different backend

Flipping coins the PyMC3 way

  • make synthetic coin flipping data, but we know the real $\theta$ value
np.random.seed(123)
trials = 4
theta_real = 0.35
data = stats.bernoulli.rvs(p=theta_real, size=trials)
data
array([1, 0, 0, 0])

Model specification

  • need to specify the likelihood function and prior probability distribution
    • likelihood: binomial distribution with $n=1$ and $p=\theta$
    • prior: beta distribution with $\alpha=1$ and $\beta=1$
      • this beta distribution is equivalent to a uniform distirbution from $[0,1]$

$$ \theta \sim \text{Beta}(\alpha, \beta) \\ y \sim \text{Bern}(p=\theta) $$

  • this model using PyMC3:
with pm.Model() as our_first_model:
    θ = pm.Beta("θ", alpha=1.0, beta=1.0)
    y = pm.Bernoulli("y", p=θ, observed=data)
    trace = pm.sample(1000, random_seed=123)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [θ]
<style> /*Turns off some styling*/ progress { /*gets rid of default border in Firefox and Opera.*/ border: none; /*Needs to be in here for Safari polyfill so background images work as expected.*/ background-size: auto; } .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar { background: #F44336; } </style> 100.00% [4000/4000 00:05<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 16 seconds.

Pushing the inference button

  • the last line of the model specification above is "pressing the inference button"
    • asks for 1,000 samples from the posterior distribution

Summarizing the posterior

  • use plot_trace() to see the distribution of sampled values for $\theta$ and the MCMC chains
az_trace = az.from_pymc3(trace=trace, model=our_first_model)
p = az.plot_trace(az_trace)

png

  • use summary() to get a Pandas data frame describing the posterior
az.summary(az_trace)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_mean ess_sd ess_bulk ess_tail r_hat
θ 0.336 0.177 0.027 0.663 0.006 0.004 806.0 806.0 744.0 800.0 1.0
  • use plot_posterior() to visualize the posterior distribution of $\theta$
p = az.plot_posterior(az_trace)

png

Posterior-based descisions

  • sometimes need to go further than just descirbing the posterior and actually make a decision

ROPE

  • Region Of Practical Equivalence: a region of the posteior distibution that would be effectively equivalent to a specific value
    • example: a fair coin has $\theta=0.5$ then the ROPE could be $[0.45, 0.55]$

A ROPE is an arbitrary interval we choose based on background knowledge. Any value inside this interval is assumed to be of practical equivalence.

  • comapre the ROPE against the Highest-Posterior Density (HPD)
  • 3 possible scenarios:
    • the ROPE and HPD do not overalp: the coin is not fair
    • the ROPE contains the entire HPD: the coin is fair
    • the ROPE partially overlaps the HPD: the coin may be fair or unfair
  • can include a range for the ROPE in plot_posterior()
p = az.plot_posterior(az_trace, rope=[0.45, 0.55])

png

  • can also compare the posterior against a reference value
    • example: 0.5 for a fair coin
az.plot_posterior(az_trace, rope=[0.45, 0.55], ref_val=0.5)
<AxesSubplot:title={'center':'θ'}>

png

Loss functions

  • loss functoin is a mathematical formalization of a cost/benefit trade-off
    • indicate how different the true and estimated values of a prameter are
  • common loss functions:
    • quadratic: $(\theta - \hat{\theta})^2$
    • absolute: $| \theta - \hat{\theta} |$
    • 0-1: $I(\theta \neq \hat{\theta})$
  • since we don't know the true parameter value, instead try to find the value of $\hat{\theta}$ that minimizes the expected loss function
    • expected loss function is the loss function averages over the entire posterior distribution
  • example below shows the absolute loss lossf_a and quadratic loss lossf_b
grid = np.linspace(0, 1, 200)
theta_pos = trace["θ"]
lossf_a = [np.mean(abs(i - theta_pos)) for i in grid]
lossf_b = [np.mean((i - theta_pos) ** 2) for i in grid]

for lossf, c in zip([lossf_a, lossf_b], ["C0", "C1"]):
    mini = np.argmin(lossf)
    plt.plot(grid, lossf, c)
    plt.plot(grid[mini], lossf[mini], "o", color=c)
    plt.annotate("{:.2f}".format(grid[mini]), (grid[mini], lossf[mini] + 0.03), color=c)
    plt.yticks([])
    plt.xlabel(r"$\hat{\theta}$")

png

Gaussians all the way down

Gaussian inferences

  • the data need not be a Gaussian, just that a Gaussian is a reasonable approximation of the data
    • some example data of NMR chemical shifts
data = np.loadtxt("data/chemical_shifts.csv")
data
array([51.06, 55.12, 53.73, 50.24, 52.05, 56.4 , 48.45, 52.34, 55.65,
       51.49, 51.86, 63.43, 53.  , 56.09, 51.93, 52.31, 52.33, 57.48,
       57.44, 55.14, 53.93, 54.62, 56.09, 68.58, 51.36, 55.47, 50.73,
       51.94, 54.95, 50.39, 52.91, 51.5 , 52.68, 47.72, 49.73, 51.82,
       54.99, 52.84, 53.19, 54.52, 51.46, 53.73, 51.61, 49.81, 52.42,
       54.3 , 53.84, 53.16])
az.plot_kde(data, rug=True)
plt.yticks([0], alpha=0)
plt.show()

png

  • we do not know the mean nor standard deviation, so set priors for them
  • example model

$$ \mu \sim U(l,h) \\ \sigma \sim | \mathcal{N}(0, \sigma_{\sigma}) | \\ y \sim \mathcal{N}(\mu, \sigma) $$

  • this model in code
with pm.Model() as model_g:
    µ = pm.Uniform("µ", lower=40, upper=70)
    σ = pm.HalfNormal("σ", sd=10)
    y = pm.Normal("y", mu=µ, sd=σ, observed=data)
    trace_g = pm.sample(1000)

az_trace_g = az.from_pymc3(trace=trace_g, model=model_g)
az.plot_trace(az_trace_g)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [σ, µ]
<style> /*Turns off some styling*/ progress { /*gets rid of default border in Firefox and Opera.*/ border: none; /*Needs to be in here for Safari polyfill so background images work as expected.*/ background-size: auto; } .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar { background: #F44336; } </style> 100.00% [4000/4000 00:04<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 12 seconds.





array([[<AxesSubplot:title={'center':'µ'}>,
        <AxesSubplot:title={'center':'µ'}>],
       [<AxesSubplot:title={'center':'σ'}>,
        <AxesSubplot:title={'center':'σ'}>]], dtype=object)

png

az.plot_pair(az_trace_g, kind="kde", fill_last=False)
plt.show()

png

  • can get access to values in a trace object by indexing with the parameter name
    • returns a NumPy array
len(trace_g["σ"])
2000
az.summary(az_trace_g)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_mean ess_sd ess_bulk ess_tail r_hat
µ 53.503 0.512 52.608 54.518 0.013 0.009 1490.0 1486.0 1505.0 1241.0 1.0
σ 3.533 0.377 2.874 4.258 0.009 0.006 1731.0 1715.0 1741.0 1297.0 1.0
  • use the sample_posterior_predictive() function to do posterior predictive checks
y_pred_g = pm.sample_posterior_predictive(trace_g, 100, model_g)
/usr/local/Caskroom/miniconda/base/envs/bayesian-analysis-with-python_e2/lib/python3.9/site-packages/pymc3/sampling.py:1707: UserWarning: samples parameter is smaller than nchains times ndraws, some draws and/or chains may not be represented in the returned posterior predictive sample
  warnings.warn(
<style> /*Turns off some styling*/ progress { /*gets rid of default border in Firefox and Opera.*/ border: none; /*Needs to be in here for Safari polyfill so background images work as expected.*/ background-size: auto; } .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar { background: #F44336; } </style> 100.00% [100/100 00:00<00:00]
data_ppc = az.from_pymc3(trace=trace_g, model=model_g, posterior_predictive=y_pred_g)
ax = az.plot_ppc(data_ppc, figsize=(12, 6), mean=False)
plt.legend(fontsize=15)
plt.show()
arviz.data.io_pymc3 - WARNING - posterior predictive variable y's shape not compatible with number of chains and draws. This can mean that some draws or even whole chains are not represented.

png

  • from the plot, see that the mean is shifted to the right and variance is quite high
    • see a better model for this data in the next section

Robust inferences

Student's t-distribution

  • instead of removing outliers, replace the Gaussian with a Student's t-distribution
    • has 3 parameters: mean, scale, and degrees of freedom ($\nu \in [0, \infty]$)
    • call the $\nu$ the normality parameter because it controls how "normal-like" the distribution is
      • $\nu = 1$: heavy tails (Cauchy or Lorentz distributions)
      • as $\nu$ gets larger, it appraoches a Gaussian
    • for $\nu \leq 1$: no defined mean
    • for $\nu \leq 2$: no defined variance (or std. dev.)
plt.figure(figsize=(10, 6))
x_values = np.linspace(-10, 10, 500)
for dof in [1, 2, 30]:
    distrib = stats.t(dof)
    x_pdf = distrib.pdf(x_values)
    plt.plot(x_values, x_pdf, label=fr"$\nu = {dof}$", lw=3)

x_pdf = stats.norm.pdf(x_values)
plt.plot(x_values, x_pdf, "k--", label=r"$\nu = \infty$")
plt.xlabel("x")
plt.yticks([])
plt.legend()
plt.xlim(-5, 5)
plt.show()

png

  • rewrite the previous model with Student's t-distribution instead of a Gaussian
    • we will provide $\nu$ with an unimformative exponential prior to indicate we think it will be mostly normal

$$ \mu \sim \mathcal{U}(l,h) \\ \sigma \sim | \mathcal{N}(0, \sigma_{\sigma}) | \\ \nu \sim Exp(\lambda) \\ y \sim \mathcal{T}(\mu, \sigma, \nu) $$

with pm.Model() as model_t:
    µ = pm.Uniform("µ", 40, 75)
    σ = pm.HalfNormal("σ", sd=10)
    ν = pm.Exponential("ν", 1 / 30)  # d.o.f. is parameterized as the inverse mean
    y = pm.StudentT("y", mu=µ, sd=σ, nu=ν, observed=data)
    trace_t = pm.sample(1000)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [ν, σ, µ]
<style> /*Turns off some styling*/ progress { /*gets rid of default border in Firefox and Opera.*/ border: none; /*Needs to be in here for Safari polyfill so background images work as expected.*/ background-size: auto; } .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar { background: #F44336; } </style> 100.00% [4000/4000 00:07<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 14 seconds.
az_trace_t = az.from_pymc3(trace=trace_t, model=model_t)
az.plot_trace(az_trace_t)
plt.show()

png

az.summary(az_trace_t)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_mean ess_sd ess_bulk ess_tail r_hat
µ 53.018 0.388 52.317 53.736 0.012 0.008 1124.0 1124.0 1124.0 1257.0 1.0
σ 2.199 0.415 1.426 2.950 0.014 0.010 844.0 844.0 838.0 823.0 1.0
ν 4.854 5.626 1.386 10.063 0.208 0.147 729.0 729.0 854.0 904.0 1.0
y_ppc_t = pm.sample_posterior_predictive(trace_t, 100, model_t, random_seed=123)
y_pred_t = az.from_pymc3(trace=trace_t, model=model_t, posterior_predictive=y_ppc_t)
ax = az.plot_ppc(y_pred_t, figsize=(12, 6), mean=False)
plt.legend(fontsize=15)
plt.xlim(40, 70)
plt.show()
/usr/local/Caskroom/miniconda/base/envs/bayesian-analysis-with-python_e2/lib/python3.9/site-packages/pymc3/sampling.py:1707: UserWarning: samples parameter is smaller than nchains times ndraws, some draws and/or chains may not be represented in the returned posterior predictive sample
  warnings.warn(
<style> /*Turns off some styling*/ progress { /*gets rid of default border in Firefox and Opera.*/ border: none; /*Needs to be in here for Safari polyfill so background images work as expected.*/ background-size: auto; } .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar { background: #F44336; } </style> 100.00% [100/100 00:00<00:00]
arviz.data.io_pymc3 - WARNING - posterior predictive variable y's shape not compatible with number of chains and draws. This can mean that some draws or even whole chains are not represented.

png

Groups comparison

  • this author emphasizes useing effect size and uncertainty over just presenting a p-value (or some other yes/no indicator)
  • specifically, 3 tools:
    1. a posterior plot with a reference value
    2. the Cohen's d
    3. the probability of superiority

Cohen's d

formula: $\frac{\mu_2 - \mu_1}{\sqrt{\frac{\sigma_2^2 + \sigma_1^2}{2}}}$

  • the difference in means with respect to the pooled standard deviation of both groups

Probability of superiority

  • the probability that a data point taken at random from one group has a larger value than one taken at random from the other group
  • if the data is normally distributed, can compute using Cohen's d ($\delta$) and the cumulative normal distribution $\Phi$

$ps = \Phi(\frac{\delta}{\sqrt{2}})$

  • it can also be computed if we have direct samples from the posterior (such as from MCMC)

The tips dataset

  • example: study the effect of the day of the week on the amount of tips at a restaurant
tips = pd.read_csv("data/tips.csv")
tips.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
plt.figure(figsize=(8, 5))
sns.violinplot(x="day", y="tip", data=tips)
plt.show()

png

  • make some variables to make handling the data easier
tip = tips["tip"].values
idx = pd.Categorical(tips["day"], categories=["Thur", "Fri", "Sat", "Sun"]).codes
groups = len(np.unique(idx))
  • same model as model_g except now $\mu$ and $\sigma$ are vectors, one value per day of the week
with pm.Model() as comparing_groups:
    µ = pm.Normal("µ", mu=0, sd=10, shape=groups)
    σ = pm.HalfNormal("σ", sd=10, shape=groups)

    y = pm.Normal("y", mu=µ[idx], sd=σ[idx], observed=tip)

    trace_cg = pm.sample(5000)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [σ, µ]
<style> /*Turns off some styling*/ progress { /*gets rid of default border in Firefox and Opera.*/ border: none; /*Needs to be in here for Safari polyfill so background images work as expected.*/ background-size: auto; } .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar { background: #F44336; } </style> 100.00% [12000/12000 00:11<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 5_000 draw iterations (2_000 + 10_000 draws total) took 20 seconds.
az_trace_cg = az.from_pymc3(trace=trace_cg, model=comparing_groups)
az.plot_trace(az_trace_cg)
plt.show()

png

dist = stats.norm()

fig, ax = plt.subplots(3, 2, figsize=(14, 8), constrained_layout=True)

comparisons = [(i, j) for i in range(4) for j in range(i + 1, 4)]
pos = [(k, l) for k in range(3) for l in (0, 1)]

for (i, j), (k, l) in zip(comparisons, pos):
    means_diff = trace_cg["µ"][:, i] - trace_cg["µ"][:, j]
    d_cohen = (
        means_diff / np.sqrt((trace_cg["σ"][:, i] ** 2 + trace_cg["σ"][:, j] ** 2) / 2)
    ).mean()
    ps = dist.cdf(d_cohen / (2 ** 0.5))

    az.plot_posterior(means_diff, ref_val=0, ax=ax[k, l])
    ax[k, l].set_title(fr"$\mu_{i}-\mu_{j}$")
    ax[k, l].plot(0, label=f"Cohen's d = {d_cohen:.2f}\nProb sup = {ps:.2f}", alpha=0)
    ax[k, l].legend()

png

Hierarchical models

  • instead of fixing parameters of the priors to some constant, estimate the priors over all of the data
    • called hyper-priors with hyperparameters
  • example of mock water quality measurements collected from 3 regions within a city
    • values of lead above healthy standards are marked with 0 and those below are makred with 1
N_samples = [30, 30, 30]
G_samples = [18, 18, 18]

group_idx = np.repeat(np.arange(len(N_samples)), N_samples)
data = []
for i in range(0, len(N_samples)):
    data.extend(np.repeat([1, 0], [G_samples[i], N_samples[i] - G_samples[i]]))

print(group_idx[:20])
print(data[:20])
print(len(data))
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]
90
  • model is a the same as for a coin flip except:
    • defined 2 hhyper-priors that will influence the beta prior
    • instead of putting hyper-priors on the parameters $\alpha$ and $\beta$, indirectly define them with $\mu$ (mean of the beta distribution)and $\kappa$ (the precision of the beta distribution; effectively the inverse of std. dev.)

$$ \mu \sim \text{Beta}(\alpha_{\mu}, \beta_{\mu}) \\ \kappa \sim | \mathcal{N}(0, \sigma_\kappa) | \\ \alpha = \mu \times \kappa \\ \beta = (1 - \mu) \times \kappa \\ \theta_i \sim \text{Beta}(\alpha_i, \beta_i) \\ y_i \sim \text{Bernoulli}(\theta_i) $$

with pm.Model() as model_h:
    µ = pm.Beta("µ", 1.0, 1.0)
    κ = pm.HalfNormal("κ", 10)

    θ = pm.Beta("θ", alpha=µ * κ, beta=(1.0 - µ) * κ, shape=len(N_samples))
    y = pm.Bernoulli("y", p=θ[group_idx], observed=data)

    trace_h = pm.sample(2000)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [θ, κ, µ]
<style> /*Turns off some styling*/ progress { /*gets rid of default border in Firefox and Opera.*/ border: none; /*Needs to be in here for Safari polyfill so background images work as expected.*/ background-size: auto; } .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar { background: #F44336; } </style> 100.00% [6000/6000 00:08<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 15 seconds.
az_trace_h = az.from_pymc3(trace=trace_h, model=model_h)
az.plot_trace(az_trace_h)
plt.show()

png

Shrinkage

  • when the parameters share information through the hyper-prior, they shrink towards the value instead of remaining completely separate
    • paritally pooling the data
  • shrinkinage contributes to more stable inferences
  • can use a more informative prior on the hyper-prior to increase the amount of shrinkage

One more example

  • an example of a hierarchical model with protein NMR data
    • want to compare the differences between theoretical (theo) and experimental (exp) chemical shift values
cs_data = pd.read_csv("data/chemical_shifts_theo_exp.csv")
cs_data.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
ID aa theo exp
0 1BM8 ILE 61.18 58.27
1 1BM8 TYR 56.95 56.18
2 1BM8 SER 56.35 56.84
3 1BM8 ALA 51.96 51.01
4 1BM8 ARG 56.54 54.64
diff = cs_data.theo.values - cs_data.exp.values
idx = pd.Categorical(cs_data["aa"]).codes
groups = len(np.unique(idx))
print(f"There are {groups} different amino acids in the data.")
There are 19 different amino acids in the data.
  • for comparison, we will build both a non-hierarchical model and a hierarchical model
with pm.Model() as cs_nh:
    µ = pm.Normal("µ", mu=0, sd=10, shape=groups)
    σ = pm.HalfNormal("σ", sd=10, shape=groups)

    y = pm.Normal("y", mu=µ[idx], sd=σ[idx], observed=diff)

    trace_cs_nh = pm.sample(1000)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [σ, µ]
<style> /*Turns off some styling*/ progress { /*gets rid of default border in Firefox and Opera.*/ border: none; /*Needs to be in here for Safari polyfill so background images work as expected.*/ background-size: auto; } .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar { background: #F44336; } </style> 100.00% [4000/4000 00:09<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 15 seconds.
with pm.Model() as cs_h:
    # hyper-priors on varying µ
    µ_µ = pm.Normal("µ_µ", mu=0, sd=10)
    σ_µ = pm.HalfNormal("σ_µ", 10)

    # priors
    µ = pm.Normal("µ", mu=µ_µ, sd=σ_µ, shape=groups)
    σ = pm.HalfNormal("σ", sd=10, shape=groups)

    y = pm.Normal("y", mu=µ[idx], sd=σ[idx], observed=diff)

    trace_cs_h = pm.sample(1000)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [σ, µ, σ_µ, µ_µ]
<style> /*Turns off some styling*/ progress { /*gets rid of default border in Firefox and Opera.*/ border: none; /*Needs to be in here for Safari polyfill so background images work as expected.*/ background-size: auto; } .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar { background: #F44336; } </style> 100.00% [4000/4000 00:10<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 17 seconds.
  • compare the models using plot_forest()
axes = az.plot_forest(
    [az.from_pymc3(trace_cs_nh), az.from_pymc3(trace_cs_h)],
    model_names=["n_h", "h"],
    var_names="µ",
    combined=True,
    colors="cycle",
)

y_lims = axes[0].get_ylim()
axes[0].vlines(trace_cs_h["µ_µ"].mean(), *y_lims)

plt.show()
/usr/local/Caskroom/miniconda/base/envs/bayesian-analysis-with-python_e2/lib/python3.9/site-packages/arviz/data/io_pymc3.py:87: FutureWarning: Using `from_pymc3` without the model will be deprecated in a future release. Not using the model will return less accurate and less useful results. Make sure you use the model argument or call from_pymc3 within a model context.
  warnings.warn(

png


%load_ext watermark
%watermark -d -u -v -iv -b -h -m
Last updated: 2021-01-04

Python implementation: CPython
Python version       : 3.9.1
IPython version      : 7.19.0

Compiler    : Clang 10.0.0 
OS          : Darwin
Release     : 20.1.0
Machine     : x86_64
Processor   : i386
CPU cores   : 4
Architecture: 64bit

Hostname: JHCookMac.local

Git branch: master

matplotlib: 3.3.3
pandas    : 1.2.0
seaborn   : 0.11.1
arviz     : 0.10.0
scipy     : 1.6.0
numpy     : 1.19.4
pymc3     : 3.9.3