-
Notifications
You must be signed in to change notification settings - Fork 102
/
Copy pathindex.qmd
executable file
·534 lines (381 loc) · 21.2 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
---
title: "Core Functionality"
engine: julia
---
```{julia}
#| echo: false
#| output: false
using Pkg;
Pkg.instantiate();
```
This article provides an overview of the core functionality in Turing.jl, which are likely to be used across a wide range of models.
## Basics
### Introduction
A probabilistic program is Julia code wrapped in a `@model` macro. It can use arbitrary Julia code, but to ensure correctness of inference it should not have external effects or modify global state. Stack-allocated variables are safe, but mutable heap-allocated objects may lead to subtle bugs when using task copying. By default Libtask deepcopies `Array` and `Dict` objects when copying task to avoid bugs with data stored in mutable structure in Turing models.
To specify distributions of random variables, Turing programs should use the `~` notation:
`x ~ distr` where `x` is a symbol and `distr` is a distribution. If `x` is undefined in the model function, inside the probabilistic program, this puts a random variable named `x`, distributed according to `distr`, in the current scope. `distr` can be a value of any type that implements `rand(distr)`, which samples a value from the distribution `distr`. If `x` is defined, this is used for conditioning in a style similar to [Anglican](https://probprog.github.io/anglican/index.html) (another PPL). In this case, `x` is an observed value, assumed to have been drawn from the distribution `distr`. The likelihood is computed using `logpdf(distr,y)`. The observe statements should be arranged so that every possible run traverses all of them in exactly the same order. This is equivalent to demanding that they are not placed inside stochastic control flow.
Available inference methods include Importance Sampling (IS), Sequential Monte Carlo (SMC), Particle Gibbs (PG), Hamiltonian Monte Carlo (HMC), Hamiltonian Monte Carlo with Dual Averaging (HMCDA) and The No-U-Turn Sampler (NUTS).
### Simple Gaussian Demo
Below is a simple Gaussian demo illustrate the basic usage of Turing.jl.
```{julia}
# Import packages.
using Turing
using StatsPlots
# Define a simple Normal model with unknown mean and variance.
@model function gdemo(x, y)
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
x ~ Normal(m, sqrt(s²))
return y ~ Normal(m, sqrt(s²))
end
```
Note: As a sanity check, the prior expectation of `s²` is `mean(InverseGamma(2, 3)) = 3/(2 - 1) = 3` and the prior expectation of `m` is 0. This can be easily checked using `Prior`:
```{julia}
#| output: false
setprogress!(false)
```
```{julia}
p1 = sample(gdemo(missing, missing), Prior(), 100000)
```
We can perform inference by using the `sample` function, the first argument of which is our probabilistic program and the second of which is a sampler. More information on each sampler is located in the [API]({{< meta site-url >}}/library).
```{julia}
# Run sampler, collect results.
c1 = sample(gdemo(1.5, 2), SMC(), 1000)
c2 = sample(gdemo(1.5, 2), PG(10), 1000)
c3 = sample(gdemo(1.5, 2), HMC(0.1, 5), 1000)
c4 = sample(gdemo(1.5, 2), Gibbs(PG(10, :m), HMC(0.1, 5, :s²)), 1000)
c5 = sample(gdemo(1.5, 2), HMCDA(0.15, 0.65), 1000)
c6 = sample(gdemo(1.5, 2), NUTS(0.65), 1000)
```
The `MCMCChains` module (which is re-exported by Turing) provides plotting tools for the `Chain` objects returned by a `sample` function. See the [MCMCChains](https://github.com/TuringLang/MCMCChains.jl) repository for more information on the suite of tools available for diagnosing MCMC chains.
```{julia}
#| eval: false
# Summarise results
describe(c3)
# Plot results
plot(c3)
savefig("gdemo-plot.png")
```
The arguments for each sampler are:
- SMC: number of particles.
- PG: number of particles, number of iterations.
- HMC: leapfrog step size, leapfrog step numbers.
- Gibbs: component sampler 1, component sampler 2, ...
- HMCDA: total leapfrog length, target accept ratio.
- NUTS: number of adaptation steps (optional), target accept ratio.
For detailed information on the samplers, please review Turing.jl's [API]({{< meta site-url >}}/library) documentation.
### Modelling Syntax Explained
Using this syntax, a probabilistic model is defined in Turing. The model function generated by Turing can then be used to condition the model onto data. Subsequently, the sample function can be used to generate samples from the posterior distribution.
In the following example, the defined model is conditioned to the data (arg*1 = 1, arg*2 = 2) by passing (1, 2) to the model function.
```{julia}
#| eval: false
@model function model_name(arg_1, arg_2)
return ...
end
```
The conditioned model can then be passed onto the sample function to run posterior inference.
```{julia}
#| eval: false
model_func = model_name(1, 2)
chn = sample(model_func, HMC(..)) # Perform inference by sampling using HMC.
```
The returned chain contains samples of the variables in the model.
```{julia}
#| eval: false
var_1 = mean(chn[:var_1]) # Taking the mean of a variable named var_1.
```
The key (`:var_1`) can be a `Symbol` or a `String`. For example, to fetch `x[1]`, one can use `chn[Symbol("x[1]")]` or `chn["x[1]"]`.
If you want to retrieve all parameters associated with a specific symbol, you can use `group`. As an example, if you have the
parameters `"x[1]"`, `"x[2]"`, and `"x[3]"`, calling `group(chn, :x)` or `group(chn, "x")` will return a new chain with only `"x[1]"`, `"x[2]"`, and `"x[3]"`.
Turing does not have a declarative form. More generally, the order in which you place the lines of a `@model` macro matters. For example, the following example works:
```{julia}
# Define a simple Normal model with unknown mean and variance.
@model function model_function(y)
s ~ Poisson(1)
y ~ Normal(s, 1)
return y
end
sample(model_function(10), SMC(), 100)
```
But if we switch the `s ~ Poisson(1)` and `y ~ Normal(s, 1)` lines, the model will no longer sample correctly:
```{julia}
#| eval: false
# Define a simple Normal model with unknown mean and variance.
@model function model_function(y)
y ~ Normal(s, 1)
s ~ Poisson(1)
return y
end
sample(model_function(10), SMC(), 100)
```
### Sampling Multiple Chains
Turing supports distributed and threaded parallel sampling. To do so, call `sample(model, sampler, parallel_type, n, n_chains)`, where `parallel_type` can be either `MCMCThreads()` or `MCMCDistributed()` for thread and parallel sampling, respectively.
Having multiple chains in the same object is valuable for evaluating convergence. Some diagnostic functions like `gelmandiag` require multiple chains.
If you do not want parallelism or are on an older version Julia, you can sample multiple chains with the `mapreduce` function:
```{julia}
#| eval: false
# Replace num_chains below with however many chains you wish to sample.
chains = mapreduce(c -> sample(model_fun, sampler, 1000), chainscat, 1:num_chains)
```
The `chains` variable now contains a `Chains` object which can be indexed by chain. To pull out the first chain from the `chains` object, use `chains[:,:,1]`. The method is the same if you use either of the below parallel sampling methods.
#### Multithreaded sampling
If you wish to perform multithreaded sampling and are running Julia 1.3 or greater, you can call `sample` with the following signature:
```{julia}
#| eval: false
using Turing
@model function gdemo(x)
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
for i in eachindex(x)
x[i] ~ Normal(m, sqrt(s²))
end
end
model = gdemo([1.5, 2.0])
# Sample four chains using multiple threads, each with 1000 samples.
sample(model, NUTS(), MCMCThreads(), 1000, 4)
```
Be aware that Turing cannot add threads for you -- you must have started your Julia instance with multiple threads to experience any kind of parallelism. See the [Julia documentation](https://docs.julialang.org/en/v1/manual/parallel-computing/#man-multithreading-1) for details on how to achieve this.
#### Distributed sampling
To perform distributed sampling (using multiple processes), you must first import `Distributed`.
Process parallel sampling can be done like so:
```{julia}
#| eval: false
# Load Distributed to add processes and the @everywhere macro.
using Distributed
# Load Turing.
using Turing
# Add four processes to use for sampling.
addprocs(4; exeflags="--project=$(Base.active_project())")
# Initialize everything on all the processes.
# Note: Make sure to do this after you've already loaded Turing,
# so each process does not have to precompile.
# Parallel sampling may fail silently if you do not do this.
@everywhere using Turing
# Define a model on all processes.
@everywhere @model function gdemo(x)
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
for i in eachindex(x)
x[i] ~ Normal(m, sqrt(s²))
end
end
# Declare the model instance everywhere.
@everywhere model = gdemo([1.5, 2.0])
# Sample four chains using multiple processes, each with 1000 samples.
sample(model, NUTS(), MCMCDistributed(), 1000, 4)
```
### Sampling from an Unconditional Distribution (The Prior)
Turing allows you to sample from a declared model's prior. If you wish to draw a chain from the prior to inspect your prior distributions, you can simply run
```{julia}
#| eval: false
chain = sample(model, Prior(), n_samples)
```
You can also run your model (as if it were a function) from the prior distribution, by calling the model without specifying inputs or a sampler. In the below example, we specify a `gdemo` model which returns two variables, `x` and `y`. The model includes `x` and `y` as arguments, but calling the function without passing in `x` or `y` means that Turing's compiler will assume they are missing values to draw from the relevant distribution. The `return` statement is necessary to retrieve the sampled `x` and `y` values.
Assign the function with `missing` inputs to a variable, and Turing will produce a sample from the prior distribution.
```{julia}
@model function gdemo(x, y)
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
x ~ Normal(m, sqrt(s²))
y ~ Normal(m, sqrt(s²))
return x, y
end
```
Assign the function with `missing` inputs to a variable, and Turing will produce a sample from the prior distribution.
```{julia}
# Samples from p(x,y)
g_prior_sample = gdemo(missing, missing)
g_prior_sample()
```
### Sampling from a Conditional Distribution (The Posterior)
#### Treating observations as random variables
Inputs to the model that have a value `missing` are treated as parameters, aka random variables, to be estimated/sampled. This can be useful if you want to simulate draws for that parameter, or if you are sampling from a conditional distribution. Turing supports the following syntax:
```{julia}
@model function gdemo(x, ::Type{T}=Float64) where {T}
if x === missing
# Initialize `x` if missing
x = Vector{T}(undef, 2)
end
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
for i in eachindex(x)
x[i] ~ Normal(m, sqrt(s²))
end
end
# Construct a model with x = missing
model = gdemo(missing)
c = sample(model, HMC(0.01, 5), 500)
```
Note the need to initialize `x` when missing since we are iterating over its elements later in the model. The generated values for `x` can be extracted from the `Chains` object using `c[:x]`.
Turing also supports mixed `missing` and non-`missing` values in `x`, where the missing ones will be treated as random variables to be sampled while the others get treated as observations. For example:
```{julia}
@model function gdemo(x)
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
for i in eachindex(x)
x[i] ~ Normal(m, sqrt(s²))
end
end
# x[1] is a parameter, but x[2] is an observation
model = gdemo([missing, 2.4])
c = sample(model, HMC(0.01, 5), 500)
```
#### Default Values
Arguments to Turing models can have default values much like how default values work in normal Julia functions. For instance, the following will assign `missing` to `x` and treat it as a random variable. If the default value is not `missing`, `x` will be assigned that value and will be treated as an observation instead.
```{julia}
using Turing
@model function generative(x=missing, ::Type{T}=Float64) where {T<:Real}
if x === missing
# Initialize x when missing
x = Vector{T}(undef, 10)
end
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
for i in 1:length(x)
x[i] ~ Normal(m, sqrt(s²))
end
return s², m
end
m = generative()
chain = sample(m, HMC(0.01, 5), 1000)
```
#### Access Values inside Chain
You can access the values inside a chain several ways:
1. Turn them into a `DataFrame` object
2. Use their raw `AxisArray` form
3. Create a three-dimensional `Array` object
For example, let `c` be a `Chain`:
1. `DataFrame(c)` converts `c` to a `DataFrame`,
2. `c.value` retrieves the values inside `c` as an `AxisArray`, and
3. `c.value.data` retrieves the values inside `c` as a 3D `Array`.
#### Variable Types and Type Parameters
The element type of a vector (or matrix) of random variables should match the `eltype` of its prior distribution, `<: Integer` for discrete distributions and `<: AbstractFloat` for continuous distributions. Moreover, if the continuous random variable is to be sampled using a Hamiltonian sampler, the vector's element type needs to either be:
1. `Real` to enable auto-differentiation through the model which uses special number types that are sub-types of `Real`, or
2. Some type parameter `T` defined in the model header using the type parameter syntax, e.g. `function gdemo(x, ::Type{T} = Float64) where {T}`.
Similarly, when using a particle sampler, the Julia variable used should either be:
1. An `Array`, or
2. An instance of some type parameter `T` defined in the model header using the type parameter syntax, e.g. `function gdemo(x, ::Type{T} = Vector{Float64}) where {T}`.
### Querying Probabilities from Model or Chain
Turing offers three functions: [`loglikelihood`](https://turinglang.org/DynamicPPL.jl/dev/api/#StatsAPI.loglikelihood), [`logprior`](https://turinglang.org/DynamicPPL.jl/dev/api/#DynamicPPL.logprior), and [`logjoint`](https://turinglang.org/DynamicPPL.jl/dev/api/#DynamicPPL.logjoint) to query the log-likelihood, log-prior, and log-joint probabilities of a model, respectively.
Let's look at a simple model called `gdemo`:
```{julia}
@model function gdemo0()
s ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s))
return x ~ Normal(m, sqrt(s))
end
```
If we observe x to be 1.0, we can condition the model on this datum using the [`condition`](https://turinglang.org/DynamicPPL.jl/dev/api/#AbstractPPL.condition) syntax:
```{julia}
model = gdemo0() | (x=1.0,)
```
Now, let's compute the log-likelihood of the observation given specific values of the model parameters, `s` and `m`:
```{julia}
loglikelihood(model, (s=1.0, m=1.0))
```
We can easily verify that value in this case:
```{julia}
logpdf(Normal(1.0, 1.0), 1.0)
```
We can also compute the log-prior probability of the model for the same values of s and m:
```{julia}
logprior(model, (s=1.0, m=1.0))
```
```{julia}
logpdf(InverseGamma(2, 3), 1.0) + logpdf(Normal(0, sqrt(1.0)), 1.0)
```
Finally, we can compute the log-joint probability of the model parameters and data:
```{julia}
logjoint(model, (s=1.0, m=1.0))
```
```{julia}
logpdf(Normal(1.0, 1.0), 1.0) +
logpdf(InverseGamma(2, 3), 1.0) +
logpdf(Normal(0, sqrt(1.0)), 1.0)
```
Querying with `Chains` object is easy as well:
```{julia}
chn = sample(model, Prior(), 10)
```
```{julia}
loglikelihood(model, chn)
```
### Maximum likelihood and maximum a posterior estimates
Turing also has functions for estimating the maximum aposteriori and maximum likelihood parameters of a model. This can be done with
```{julia}
mle_estimate = maximum_likelihood(model)
map_estimate = maximum_a_posteriori(model)
```
For more details see the [mode estimation page](../docs-17-mode-estimation/index.qmd).
## Beyond the Basics
### Compositional Sampling Using Gibbs
Turing.jl provides a Gibbs interface to combine different samplers. For example, one can combine an `HMC` sampler with a `PG` sampler to run inference for different parameters in a single model as below.
```{julia}
@model function simple_choice(xs)
p ~ Beta(2, 2)
z ~ Bernoulli(p)
for i in 1:length(xs)
if z == 1
xs[i] ~ Normal(0, 1)
else
xs[i] ~ Normal(2, 1)
end
end
end
simple_choice_f = simple_choice([1.5, 2.0, 0.3])
chn = sample(simple_choice_f, Gibbs(HMC(0.2, 3, :p), PG(20, :z)), 1000)
```
The `Gibbs` sampler can be used to specify unique automatic differentiation backends for different variable spaces. Please see the [Automatic Differentiation]({{< meta site-url >}}/dev/docs/using-turing/autodiff) article for more.
For more details of compositional sampling in Turing.jl, please check the corresponding [paper](https://proceedings.mlr.press/v84/ge18b.html).
### Working with filldist and arraydist
Turing provides `filldist(dist::Distribution, n::Int)` and `arraydist(dists::AbstractVector{<:Distribution})` as a simplified interface to construct product distributions, e.g., to model a set of variables that share the same structure but vary by group.
#### Constructing product distributions with filldist
The function `filldist` provides a general interface to construct product distributions over distributions of the same type and parameterisation.
Note that, in contrast to the product distribution interface provided by Distributions.jl (`Product`), `filldist` supports product distributions over univariate or multivariate distributions.
Example usage:
```{julia}
@model function demo(x, g)
k = length(unique(g))
a ~ filldist(Exponential(), k) # = Product(fill(Exponential(), k))
mu = a[g]
return x .~ Normal.(mu)
end
```
#### Constructing product distributions with `arraydist`
The function `arraydist` provides a general interface to construct product distributions over distributions of varying type and parameterisation.
Note that in contrast to the product distribution interface provided by Distributions.jl (`Product`), `arraydist` supports product distributions over univariate or multivariate distributions.
Example usage:
```{julia}
@model function demo(x, g)
k = length(unique(g))
a ~ arraydist([Exponential(i) for i in 1:k])
mu = a[g]
return x .~ Normal.(mu)
end
```
### Working with MCMCChains.jl
Turing.jl wraps its samples using `MCMCChains.Chain` so that all the functions working for `MCMCChains.Chain` can be re-used in Turing.jl. Two typical functions are `MCMCChains.describe` and `MCMCChains.plot`, which can be used as follows for an obtained chain `chn`. For more information on `MCMCChains`, please see the [GitHub repository](https://github.com/TuringLang/MCMCChains.jl).
```{julia}
describe(chn) # Lists statistics of the samples.
plot(chn) # Plots statistics of the samples.
```
There are numerous functions in addition to `describe` and `plot` in the `MCMCChains` package, such as those used in convergence diagnostics. For more information on the package, please see the [GitHub repository](https://github.com/TuringLang/MCMCChains.jl).
### Changing Default Settings
Some of Turing.jl's default settings can be changed for better usage.
#### AD Chunk Size
ForwardDiff (Turing's default AD backend) uses forward-mode chunk-wise AD. The chunk size can be set manually by `AutoForwardDiff(;chunksize=new_chunk_size)`.
#### AD Backend
Turing supports four automatic differentiation (AD) packages in the back end during sampling. The default AD backend is [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) for forward-mode AD. Three reverse-mode AD backends are also supported, namely [Tracker](https://github.com/FluxML/Tracker.jl), [Zygote](https://github.com/FluxML/Zygote.jl) and [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl). `Zygote` and `ReverseDiff` are supported optionally if explicitly loaded by the user with `using Zygote` or `using ReverseDiff` next to `using Turing`.
For more information on Turing's automatic differentiation backend, please see the [Automatic Differentiation]({{< meta site-url >}}/dev/docs/using-turing/autodiff) article.
#### Progress Logging
`Turing.jl` uses ProgressLogging.jl to log the sampling progress. Progress
logging is enabled as default but might slow down inference. It can be turned on
or off by setting the keyword argument `progress` of `sample` to `true` or `false`.
Moreover, you can enable or disable progress logging globally by calling `setprogress!(true)` or `setprogress!(false)`, respectively.
Turing uses heuristics to select an appropriate visualization backend. If you
use Jupyter notebooks, the default backend is
[ConsoleProgressMonitor.jl](https://github.com/tkf/ConsoleProgressMonitor.jl).
In all other cases, progress logs are displayed with
[TerminalLoggers.jl](https://github.com/c42f/TerminalLoggers.jl). Alternatively,
if you provide a custom visualization backend, Turing uses it instead of the
default backend.