You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importwarningsfrompathlibimportPathimportarvizasazimportmatplotlib.pyplotaspltimportnumpyasnpimportpandasaspdimportplotnineasggimportpymc3aspmimportseabornassnsimporttheano.tensorasttfromscipyimportstats# Remove annoying filters from some dated ArViz functions.warnings.simplefilter(action="ignore", category=UserWarning)
gg.theme_set(gg.theme_minimal())
%configInlineBackend.figure_format="retina"data_path=Path("data")
Gaussian Process (GP): an infinite-dimensional generalization of the Gaussian distribution used to set a prior on unknown functions
topics covered:
functions as probabilistic objects
kernels
Gaussian processes with Gaussian likelihoods
Gaussian processes with non-Gaussian likelihoods
Linear models and non-linear data
generalized linear models can be expressed as:
$$
\theta = \psi(\phi(X)\beta)
$$
where:
$\theta$: parameter vector for some probability distributions
$\psi$: inverse link function (for simple linear regression, $\psi$ is the identity function)
$\phi$: the squareroot or a polynomial function
$\beta$: vector of values to identify in the fitting process
Gaussian processes provide a principled solution to modeling arbitrary functions by effectively letting the data decide on the complexity of the function, while avoiding, or at least minimizing, the chance of overfitting.
Modeling functions
traditionally, we treat $y = f(x)$ as a mapping of values $x$ to $y$
can represent the function probabilisitically by letting each $y_i$ be a random variable distributed as a Gaussian with a mean and variance
no longer a description of a single specific function, but a familly of distributions
example of two functions with points drawn from distributions
line 1: each point is independently drawn from a 1-dimensional Gaussian
line 2: each point is drawn from a Gaussian where the mean for $y_i$ is $y_{i-1}$
$\ell$: "length-scale" controls the width of the kernel (a.k.a "bandwidth" or "variance")
example kernel to show how a $4\times4$ covariance matrix looks with different inputs
The kernel is translating the distance of the data points along the x axis to values of covariances for values of the expected function (on the y axis).
Thus, the closer two points are on the x axis, the more similar we expect their values to be on the y axis.
so far have shown we can use multivarite normal distributions to model functions
the following example uses the expoentiated quadratic kernel to define a covariance matrix of a multivariate normal and we use samples from the distribution to represent the functions
definition from Wikipedia: "The collection of random variables indexed by time or space, such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed."
need not consider the infinite mathematical object, instead marginalize over the observed data by reducing the dimensions to the number of data points we have
GPs are useful for building Bayesian non-parametric models, as we can use them as prior distributions over functions
Gaussian process regression
model a value $y$ as a function $\mathcal{f}$ of $x$ with some noise $\epsilon$:
X=x[:, None]
withpm.Model() asmodel_reg:
# Hyperprior for length-scale kernel parameter.l=pm.Gamma("l", 2, 0.5)
# Covariance function.cov=pm.gp.cov.ExpQuad(1, ls=l)
# GP prior over f.gp=pm.gp.Marginal(cov_func=cov)
# Prior for noise.epsilon=pm.HalfNormal("epsilon", 25)
# Likelihood.y_pred=gp.marginal_likelihood("y_pred", X=X, y=y, noise=epsilon)
# Sample.trace_reg=pm.sample(1000, tune=1000)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [epsilon, l]
<style>
/*Turns off some styling*/
progress {
/*gets rid of default border in Firefox and Opera.*/
border: none;
/*Needs to be in here for Safari polyfill so background images work as expected.*/
background-size: auto;
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
100.00% [4000/4000 00:28<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 37 seconds.
pm.model_to_graphviz(model_reg)
az.plot_trace(trace_reg)
plt.show()
/usr/local/Caskroom/miniconda/base/envs/bayesian-analysis-with-python_e2/lib/python3.9/site-packages/arviz/data/io_pymc3.py:87: FutureWarning: Using `from_pymc3` without the model will be deprecated in a future release. Not using the model will return less accurate and less useful results. Make sure you use the model argument or call from_pymc3 within a model context.
can get samples from the GP posterior by computing the conditional distribution evaluated over new input locations
<style>
/*Turns off some styling*/
progress {
/*gets rid of default border in Firefox and Opera.*/
border: none;
/*Needs to be in here for Safari polyfill so background images work as expected.*/
background-size: auto;
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
100.00% [200/200 00:05<00:00]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [beta, alpha, f_rotated_, l, eta]
<style>
/*Turns off some styling*/
progress {
/*gets rid of default border in Firefox and Opera.*/
border: none;
/*Needs to be in here for Safari polyfill so background images work as expected.*/
background-size: auto;
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
100.00% [4000/4000 03:29<00:00 Sampling 2 chains, 35 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 216 seconds.
There were 9 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.8977309206438002, but should be close to 0.8. Try to increase the number of tuning steps.
There were 26 divergences after tuning. Increase `target_accept` or reparameterize.
The estimated number of effective samples is smaller than 200 for some parameters.
plot some samples of the posterior distribution of covariance functions in terms of distances
the covariance is not very high on average and drops to 0 around 2,000 km
the samples are quite varying showing there is a lot of uncertainty in the estimate of covariance
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [f_rotated_, l]
<style>
/*Turns off some styling*/
progress {
/*gets rid of default border in Firefox and Opera.*/
border: none;
/*Needs to be in here for Safari polyfill so background images work as expected.*/
background-size: auto;
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
100.00% [2000/2000 01:09<00:00 Sampling chain 0, 0 divergences]
Sampling 1 chain for 1_000 tune and 1_000 draw iterations (1_000 + 1_000 draws total) took 70 seconds.
az.plot_trace(trace_iris["l"])
plt.show()
want to sample from GP posterior
compute the conditional distribution evaluated over new input locations
<style>
/*Turns off some styling*/
progress {
/*gets rid of default border in Firefox and Opera.*/
border: none;
/*Needs to be in here for Safari polyfill so background images work as expected.*/
background-size: auto;
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
100.00% [1000/1000 01:21<00:00]
/usr/local/Caskroom/miniconda/base/envs/bayesian-analysis-with-python_e2/lib/python3.9/site-packages/arviz/stats/stats.py:484: FutureWarning: hdi currently interprets 2d data as (draw, shape) but this will change in a future release to (chain, draw) for coherence with other functions
<ggplot: (8791695996151)>
the tails of the predicted covariance function revert towards the prior because there is no data
if we are only concerned with the decision boundary, then this is not a problem
can fix this by adding more structure to the GP by combining multiple covariance functions
change model to use 3 covariance kernels:
the linear kernel fixes the tail issue
the whitenoise is a computational trick to stabilze the computation of the cov. matrix
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [f_rotated_, tau, c, l]
<style>
/*Turns off some styling*/
progress {
/*gets rid of default border in Firefox and Opera.*/
border: none;
/*Needs to be in here for Safari polyfill so background images work as expected.*/
background-size: auto;
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
100.00% [2000/2000 01:45<00:00 Sampling chain 0, 38 divergences]
Sampling 1 chain for 1_000 tune and 1_000 draw iterations (1_000 + 1_000 draws total) took 106 seconds.
There were 38 divergences after tuning. Increase `target_accept` or reparameterize.
<style>
/*Turns off some styling*/
progress {
/*gets rid of default border in Firefox and Opera.*/
border: none;
/*Needs to be in here for Safari polyfill so background images work as expected.*/
background-size: auto;
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
100.00% [1000/1000 01:23<00:00]
/usr/local/Caskroom/miniconda/base/envs/bayesian-analysis-with-python_e2/lib/python3.9/site-packages/arviz/stats/stats.py:484: FutureWarning: hdi currently interprets 2d data as (draw, shape) but this will change in a future release to (chain, draw) for coherence with other functions
<ggplot: (8791694991382)>
in practice, would not need a GP for a simple logistic regression; instead use when need more flexibility
example modeling the probability of getting sick as a function of age:
the old and young are at higher risk of being sick than ages in between
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [f_rotated_, l]
<style>
/*Turns off some styling*/
progress {
/*gets rid of default border in Firefox and Opera.*/
border: none;
/*Needs to be in here for Safari polyfill so background images work as expected.*/
background-size: auto;
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
100.00% [4000/4000 00:55<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 65 seconds.
<style>
/*Turns off some styling*/
progress {
/*gets rid of default border in Firefox and Opera.*/
border: none;
/*Needs to be in here for Safari polyfill so background images work as expected.*/
background-size: auto;
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
100.00% [1000/1000 01:19<00:00]
fp=logistic(pred_samples["f_pred"])
fp_df=make_f_pred_post_df(fp, x=X_new.flatten())
(
gg.ggplot(df_sf)
+gg.geom_ribbon(
gg.aes(x="x", ymin="hdi_low", ymax="hdi_high"), data=fp_df, alpha=0.2
)
+gg.geom_line(gg.aes(x="x", y="mean"), data=fp_df)
+gg.geom_jitter(
gg.aes("age", "space_flu", color="factor(space_flu)"),
width=0,
height=0.05,
alpha=0.8,
size=2,
)
+gg.scale_color_brewer(type="qual", palette="Dark2", guide=None)
+gg.labs(x="age", y="probability of space flu", title="Posterior of GP classifier")
)
/usr/local/Caskroom/miniconda/base/envs/bayesian-analysis-with-python_e2/lib/python3.9/site-packages/arviz/stats/stats.py:484: FutureWarning: hdi currently interprets 2d data as (draw, shape) but this will change in a future release to (chain, draw) for coherence with other functions
<ggplot: (8791700652213)>
Cox processes
return to modeling count data using a Poisson likelihood with the rate modeled using a Gaussian process
2 examples:
time-varying rate
2D-spatially varying rate
because the rate for the Poisson must be positive, use an exponential inverse link function
intensity estimation: another term for "variable rate"
Cox model: modeling intensity estimation; a type of Poisson process where the rate is a stochasitic process
Cox process: when the rate of the Poisson process is a stochastic process (e.g. a Gaussian process)
The coal-mining disasters
coal-mining disaster example:
record of when coal-mining disasters ocurred between 1851 through 1962
the rate may be affected by regulations or technological advancements
years=int(coal_df.time.max() -coal_df.time.min())
# Each bin represents 4 years.n_bins=years//4hist, x_edges=np.histogram(coal_df, bins=n_bins)
x_centers=x_edges[:-1] + (x_edges[1] -x_edges[0]) /2X_data=x_centers[:, None]
# As the number of disasters per year.y_data=hist/4
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [f_rotated_, l]
<style>
/*Turns off some styling*/
progress {
/*gets rid of default border in Firefox and Opera.*/
border: none;
/*Needs to be in here for Safari polyfill so background images work as expected.*/
background-size: auto;
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
100.00% [2000/2000 00:28<00:00 Sampling chain 0, 0 divergences]
Sampling 1 chain for 1_000 tune and 1_000 draw iterations (1_000 + 1_000 draws total) took 29 seconds.
Only one chain was sampled, this makes it impossible to run some convergence checks
/usr/local/Caskroom/miniconda/base/envs/bayesian-analysis-with-python_e2/lib/python3.9/site-packages/arviz/stats/stats.py:484: FutureWarning: hdi currently interprets 2d data as (draw, shape) but this will change in a future release to (chain, draw) for coherence with other functions
<ggplot: (8791699348110)>
The redwood dataset
example:
data is locations of redwood trees
objective it to identify how the rate of trees is distributed
data needs to be binned to be units of rate per area
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [f_rotated_, l]
<style>
/*Turns off some styling*/
progress {
/*gets rid of default border in Firefox and Opera.*/
border: none;
/*Needs to be in here for Safari polyfill so background images work as expected.*/
background-size: auto;
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
</style>
100.00% [4000/4000 01:52<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 120 seconds.
The estimated number of effective samples is smaller than 200 for some parameters.