You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
total_size=X_train.shape[0], # IMPORTANT for minibatches
164
-
dims="obs_id",
165
165
)
166
166
return neural_network
167
167
@@ -174,12 +174,16 @@ That's not so bad. The `Normal` priors help regularize the weights. Usually we w
174
174
175
175
+++
176
176
177
-
###Variational Inference: Scaling model complexity
177
+
## Variational Inference: Scaling model complexity
178
178
179
179
We could now just run a MCMC sampler like {class}`pymc.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.
180
180
181
181
Instead, we will use the {class}`pymc.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior.
182
182
183
+
### Mini-batch ADVI
184
+
185
+
While this simulated dataset is small enough to fit all at once, it would not scale to something big like ImageNet. In the model above, we have set up minibatches that will allow for scaling to larger data sets. Moreover, training on mini-batches of data (stochastic gradient descent) avoids local minima and can lead to faster convergence.
186
+
183
187
```{code-cell} ipython3
184
188
%%time
185
189
@@ -199,17 +203,38 @@ plt.xlabel("iteration");
199
203
trace = approx.sample(draws=5000)
200
204
```
201
205
202
-
Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation).
206
+
Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`pymc.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation).
207
+
208
+
To predict on the entire test set (and not just the minibatches) we need to create a new model object that removes the minibatches. Notice that we are using our fitted `trace` to sample from the posterior predictive distribution, using the posterior estimates from the original model. There is no new inference here, we are just using the same model and the same posterior estimates to generate predictions. The {class}`Flat` distribution is just a placeholder to make the model work; the actual values are sampled from the posterior.
@@ -304,27 +318,6 @@ We can see that very close to the decision boundary, our uncertainty as to which
304
318
305
319
+++
306
320
307
-
## Mini-batch ADVI
308
-
309
-
So far, we have trained our model on all data at once. Obviously this won't scale to something like ImageNet. Moreover, training on mini-batches of data (stochastic gradient descent) avoids local minima and can lead to faster convergence.
310
-
311
-
Fortunately, ADVI can be run on mini-batches as well. It just requires some setting up:
0 commit comments