Skip to content

Commit c760a85

Browse files
authored
SMC minor improvements (#3625)
* refactor, irmprove doc and scaling * add release note * remove psis commented code * remove psis argument, add warning, fix typo
1 parent e81df2d commit c760a85

File tree

6 files changed

+592
-551
lines changed

6 files changed

+592
-551
lines changed

Diff for: RELEASE-NOTES.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,10 @@
1818
- Fixed a bug in `Categorical.logp`. In the case of multidimensional `p`'s, the indexing was done wrong leading to incorrectly shaped tensors that consumed `O(n**2)` memory instead of `O(n)`. This fixes issue [#3535](https://github.com/pymc-devs/pymc3/issues/3535)
1919
- Fixed a defect in `OrderedLogistic.__init__` that unnecessarily increased the dimensionality of the underlying `p`. Related to issue issue [#3535](https://github.com/pymc-devs/pymc3/issues/3535) but was not the true cause of it.
2020
- SMC: stabilize covariance matrix [3573](https://github.com/pymc-devs/pymc3/pull/3573)
21-
- SMC is no longer a step method of `pm.sample` now it should be called using `pm.sample_smc` [3579](https://github.com/pymc-devs/pymc3/pull/3579)
22-
- SMC: improve computation of the proposal scaling factor [3594](https://github.com/pymc-devs/pymc3/pull/3594)
21+
- SMC: is no longer a step method of `pm.sample` now it should be called using `pm.sample_smc` [3579](https://github.com/pymc-devs/pymc3/pull/3579)
22+
- SMC: improve computation of the proposal scaling factor [3594](https://github.com/pymc-devs/pymc3/pull/3594) and [3625](https://github.com/pymc-devs/pymc3/pull/3625)
2323
- SMC: reduce number of logp evaluations [3600](https://github.com/pymc-devs/pymc3/pull/3600)
24+
- SMC: remove `scaling` and `tune_scaling` arguments as is a better idea to always allow SMC to automatically compute the scaling factor [3625](https://github.com/pymc-devs/pymc3/pull/3625)
2425
- Now uses `multiprocessong` rather than `psutil` to count CPUs, which results in reliable core counts on Chromebooks.
2526
- `sample_posterior_predictive` now preallocates the memory required for its output to improve memory usage. Addresses problems raised in this [discourse thread](https://discourse.pymc.io/t/memory-error-with-posterior-predictive-sample/2891/4).
2627
- Fixed a bug in `Categorical.logp`. In the case of multidimensional `p`'s, the indexing was done wrong leading to incorrectly shaped tensors that consumed `O(n**2)` memory instead of `O(n)`. This fixes issue [#3535](https://github.com/pymc-devs/pymc3/issues/3535)

Diff for: pymc3/smc/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
from .smc import sample_smc
1+
from .sample_smc import sample_smc

Diff for: pymc3/smc/sample_smc.py

+160
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
from .smc import SMC
2+
import logging
3+
4+
5+
def sample_smc(
6+
draws=1000,
7+
kernel="metropolis",
8+
n_steps=25,
9+
parallel=False,
10+
start=None,
11+
cores=None,
12+
tune_steps=True,
13+
p_acc_rate=0.99,
14+
threshold=0.5,
15+
epsilon=1.0,
16+
dist_func="absolute_error",
17+
sum_stat=False,
18+
progressbar=False,
19+
model=None,
20+
random_seed=-1,
21+
):
22+
"""
23+
Sequential Monte Carlo based sampling
24+
25+
Parameters
26+
----------
27+
draws : int
28+
The number of samples to draw from the posterior (i.e. last stage). And also the number of
29+
independent chains. Defaults to 1000.
30+
kernel : str
31+
Kernel method for the SMC sampler. Available option are ``metropolis`` (default) and `ABC`.
32+
Use `ABC` for likelihood free inference togheter with a ``pm.Simulator``.
33+
n_steps : int
34+
The number of steps of each Markov Chain. If ``tune_steps == True`` ``n_steps`` will be used
35+
for the first stage and for the others it will be determined automatically based on the
36+
acceptance rate and `p_acc_rate`, the max number of steps is ``n_steps``.
37+
parallel : bool
38+
Distribute computations across cores if the number of cores is larger than 1.
39+
Defaults to False.
40+
start : dict, or array of dict
41+
Starting point in parameter space. It should be a list of dict with length `chains`.
42+
When None (default) the starting point is sampled from the prior distribution.
43+
cores : int
44+
The number of chains to run in parallel. If ``None`` (default), it will be automatically
45+
set to the number of CPUs in the system.
46+
tune_steps : bool
47+
Whether to compute the number of steps automatically or not. Defaults to True
48+
p_acc_rate : float
49+
Used to compute ``n_steps`` when ``tune_steps == True``. The higher the value of
50+
``p_acc_rate`` the higher the number of steps computed automatically. Defaults to 0.99.
51+
It should be between 0 and 1.
52+
threshold : float
53+
Determines the change of beta from stage to stage, i.e.indirectly the number of stages,
54+
the higher the value of `threshold` the higher the number of stages. Defaults to 0.5.
55+
It should be between 0 and 1.
56+
epsilon : float
57+
Standard deviation of the gaussian pseudo likelihood. Only works with `kernel = ABC`
58+
dist_func : str
59+
Distance function. Available options are ``absolute_error`` (default) and
60+
``sum_of_squared_distance``. Only works with ``kernel = ABC``
61+
sum_stat : bool
62+
Whether to use or not a summary statistics. Defaults to False. Only works with
63+
``kernel = ABC``
64+
progressbar : bool
65+
Flag for displaying a progress bar. Defaults to False.
66+
model : Model (optional if in ``with`` context)).
67+
random_seed : int
68+
random seed
69+
70+
Notes
71+
-----
72+
SMC works by moving through successive stages. At each stage the inverse temperature
73+
:math:`\beta` is increased a little bit (starting from 0 up to 1). When :math:`\beta` = 0
74+
we have the prior distribution and when :math:`\beta` =1 we have the posterior distribution.
75+
So in more general terms we are always computing samples from a tempered posterior that we can
76+
write as:
77+
78+
.. math::
79+
80+
p(\theta \mid y)_{\beta} = p(y \mid \theta)^{\beta} p(\theta)
81+
82+
A summary of the algorithm is:
83+
84+
1. Initialize :math:`\beta` at zero and stage at zero.
85+
2. Generate N samples :math:`S_{\beta}` from the prior (because when :math `\beta = 0` the
86+
tempered posterior is the prior).
87+
3. Increase :math:`\beta` in order to make the effective sample size equals some predefined
88+
value (we use :math:`Nt`, where :math:`t` is 0.5 by default).
89+
4. Compute a set of N importance weights W. The weights are computed as the ratio of the
90+
likelihoods of a sample at stage i+1 and stage i.
91+
5. Obtain :math:`S_{w}` by re-sampling according to W.
92+
6. Use W to compute the covariance for the proposal distribution.
93+
7. For stages other than 0 use the acceptance rate from the previous stage to estimate the
94+
scaling of the proposal distribution and `n_steps`.
95+
8. Run N Metropolis chains (each one of length `n_steps`), starting each one from a different
96+
sample in :math:`S_{w}`.
97+
9. Repeat from step 3 until :math:`\beta \ge 1`.
98+
10. The final result is a collection of N samples from the posterior.
99+
100+
101+
References
102+
----------
103+
.. [Minson2013] Minson, S. E. and Simons, M. and Beck, J. L., (2013),
104+
Bayesian inversion for finite fault earthquake source models I- Theory and algorithm.
105+
Geophysical Journal International, 2013, 194(3), pp.1701-1726,
106+
`link <https://gji.oxfordjournals.org/content/194/3/1701.full>`__
107+
108+
.. [Ching2007] Ching, J. and Chen, Y. (2007).
109+
Transitional Markov Chain Monte Carlo Method for Bayesian Model Updating, Model Class
110+
Selection, and Model Averaging. J. Eng. Mech., 10.1061/(ASCE)0733-9399(2007)133:7(816),
111+
816-832. `link <http://ascelibrary.org/doi/abs/10.1061/%28ASCE%290733-9399
112+
%282007%29133:7%28816%29>`__
113+
"""
114+
115+
smc = SMC(
116+
draws=draws,
117+
kernel=kernel,
118+
n_steps=n_steps,
119+
parallel=parallel,
120+
start=start,
121+
cores=cores,
122+
tune_steps=tune_steps,
123+
p_acc_rate=p_acc_rate,
124+
threshold=threshold,
125+
epsilon=epsilon,
126+
dist_func=dist_func,
127+
sum_stat=sum_stat,
128+
progressbar=progressbar,
129+
model=model,
130+
random_seed=random_seed,
131+
)
132+
133+
_log = logging.getLogger("pymc3")
134+
_log.info("Sample initial stage: ...")
135+
stage = 0
136+
smc.initialize_population()
137+
smc.setup_kernel()
138+
smc.initialize_logp()
139+
140+
while smc.beta < 1:
141+
smc.update_weights_beta()
142+
_log.info(
143+
"Stage: {:3d} Beta: {:.3f} Steps: {:3d} Acce: {:.3f}".format(
144+
stage, smc.beta, smc.n_steps, smc.acc_rate
145+
)
146+
)
147+
smc.resample()
148+
smc.update_proposal()
149+
if stage > 0:
150+
smc.tune()
151+
smc.mutate()
152+
stage += 1
153+
154+
if smc.parallel and smc.cores > 1:
155+
smc.pool.close()
156+
smc.pool.join()
157+
158+
trace = smc.posterior_to_trace()
159+
160+
return trace

0 commit comments

Comments
 (0)