-
Notifications
You must be signed in to change notification settings - Fork 20
/
Copy pathtutorial.jl
149 lines (108 loc) · 5.29 KB
/
tutorial.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
using Pkg# hideall
Pkg.activate("_literate/A-model-tuning/Project.toml")
Pkg.instantiate()
macro OUTPUT()
return isdefined(Main, :Franklin) ? Franklin.OUT_PATH[] : "/tmp/"
end
# [MLJ.jl]: https://github.com/alan-turing-institute/MLJ.jl
# [RDatasets.jl]: https://github.com/JuliaStats/RDatasets.jl
# [NearestNeighbors.jl]: https://github.com/KristofferC/NearestNeighbors.jl
#
# @@dropdown
# ## Tuning a single hyperparameter
# @@
# @@dropdown-content
#
# In MLJ, tuning is implemented as a model wrapper.
# After wrapping a model in a _tuning strategy_ (e.g. cross-validation) and binding the wrapped model to data in a _machine_, fitting the machine initiates a search for optimal model hyperparameters.
#
# Let's use a decision tree classifier and tune the maximum depth of the tree.
# As usual, start by loading data and the model
using MLJ
using PrettyPrinting
MLJ.color_off() # hide
X, y = @load_iris
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
# @@dropdown
# ### Specifying a range of value
# @@
# @@dropdown-content
#
# To specify a range of value, you can use the `range` function:
dtc = DecisionTreeClassifier()
r = range(dtc, :max_depth, lower=1, upper=5)
# As you can see, the range function takes a model (`dtc`), a symbol for the hyperparameter of interest (`:max_depth`) and indication of how to samples values.
# For hyperparameters of type `<:Real`, you should specify a range of values as done above.
# For hyperparameters of other type (e.g. `Symbol`), you should use the `values=...` keyword.
#
# Once a range of values has been defined, you can then wrap the model in a `TunedModel` specifying the tuning strategy.
tm = TunedModel(model=dtc, ranges=[r, ], measure=cross_entropy)
# Note that "wrapping a model in a tuning strategy" as above means creating a new "self-tuning" version of the model, `tuned_model = TunedModel(model=...)`, in which further key-word arguments specify:
# 1. the algorithm (a.k.a., tuning strategy) for searching the hyper-parameter space of the model (e.g., `tuning = Random(rng=123)` or `tuning = Grid(goal=100)`).
# 2. the resampling strategy, used to evaluate performance for each value of the hyper-parameters (e.g., `resampling=CV(nfolds=9, rng=123)` or `resampling=Holdout(fraction_train=0.7)`).
# 3. the measure (or measures) on which to base performance evaluations (and for reporting purposes) (e.g., `measure = rms` or `measures = [rms, mae]`).
# 4. the range, usually describing the "space" of hyperparameters to be searched (but more generally whatever extra information is required to complete the search specification, e.g., initial values in gradient-descent optimization).
# For more options do `?TunedModel`.
#
# @@
# @@dropdown
# ### Fitting and inspecting a tuned model
# @@
# @@dropdown-content
#
# To fit a tuned model, you can use the usual syntax:
m = machine(tm, X, y)
fit!(m)
# In order to inspect the best model, you can use the function `fitted_params` on the machine and inspect the `best_model` field:
fitted_params(m).best_model.max_depth
# Note that here we have tuned a probabilistic model and consequently used a probabilistic measure for the tuning.
# We could also have decided we only cared about the mode and the misclassification rate, to do this, just use `operation=predict_mode` in the tuned model:
tm = TunedModel(model=dtc, ranges=r, operation=predict_mode,
measure=misclassification_rate)
m = machine(tm, X, y)
fit!(m)
fitted_params(m).best_model.max_depth
# Let's check the misclassification rate for the best model:
r = report(m)
r.best_history_entry.measurement[1]
# Anyone wants plots? of course:
using Plots
Plots.scalefontsizes() #hide
Plots.scalefontsizes(1.2) #hide
plot(m, size=(800,600))
savefig(joinpath(@OUTPUT, "A-model-tuning-hpt.svg")); # hide
# \figalt{hyperparameter heatmap}{A-model-tuning-hpt}
#
# @@
#
# @@
# @@dropdown
# ## Tuning nested hyperparameters
# @@
# @@dropdown-content
# Let's generate simple dummy regression data
X = (x1=rand(100), x2=rand(100), x3=rand(100))
y = 2X.x1 - X.x2 + 0.05 * randn(100);
# Let's then build a simple ensemble model with decision tree regressors:
DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
forest = EnsembleModel(model=DecisionTreeRegressor())
# Such a model has *nested* hyperparameters in that the ensemble has hyperparameters (e.g. the `:bagging_fraction`) and the atom has hyperparameters (e.g. `:n_subfeatures` or `:max_depth`).
# You can see this by inspecting the parameters using `params`:
params(forest) |> pprint
# Range for nested hyperparameters are specified using dot syntax, the rest is done in much the same way as before:
r1 = range(forest, :(model.n_subfeatures), lower=1, upper=3)
r2 = range(forest, :bagging_fraction, lower=0.4, upper=1.0)
tm = TunedModel(model=forest, tuning=Grid(resolution=12),
resampling=CV(nfolds=6), ranges=[r1, r2],
measure=rms)
m = machine(tm, X, y)
fit!(m);
# A useful function to inspect a model after fitting it is the `report` function which collects information on the model and the tuning, for instance you can use it to recover the best measurement:
r = report(m)
r.best_history_entry.measurement[1]
# Let's visualise this
plot(m)
savefig(joinpath(@OUTPUT, "A-model-tuning-hm.svg")); # hide
# \figalt{Hyperparameter heatmap}{A-model-tuning-hm.svg}
#
# @@