6
6
7
7
[ ci-dev ] : https://github.com/pebeto/MLJFlow.jl/actions/workflows/CI.yml
8
8
[ ci-dev-img ] : https://github.com/pebeto/MLJFlow.jl/actions/workflows/CI.yml/badge.svg?branch=dev " Continuous Integration (CPU) "
9
- [ codecov-dev ] : https://codecov.io/github/JuliaAI/MLJFlow.jl?branch=dev
10
- [ codecov-dev-img ] : https://codecov.io/gh /JuliaAI/MLJFlow.jl/branch/dev/graphs/ badge.svg?branch=dev " Code Coverage "
9
+ [ codecov-dev ] : https://codecov.io/github/JuliaAI/MLJFlow.jl
10
+ [ codecov-dev-img ] : https://codecov.io/github /JuliaAI/MLJFlow.jl/graph/ badge.svg?token=TBCMJOK1WR " Code Coverage "
11
11
12
12
[ MLJ] ( https://github.com/alan-turing-institute/MLJ.jl ) is a Julia framework for
13
13
combining and tuning machine learning models. MLJFlow is a package that extends
@@ -22,7 +22,7 @@ metrics, log parameters, log artifacts, etc.).
22
22
This project is part of the GSoC 2023 program. The proposal description can be
23
23
found [ here] ( https://summerofcode.withgoogle.com/programs/2023/projects/iRxuzeGJ ) .
24
24
The entire workload is divided into three different repositories:
25
- [ MLJ.jl] ( https://github.com/alan-turing-institute/MLJ.jl ) ,
25
+ [ MLJ.jl] ( https://github.com/alan-turing-institute/MLJ.jl ) ,
26
26
[ MLFlowClient.jl] ( https://github.com/JuliaAI/MLFlowClient.jl ) and this one.
27
27
28
28
## Features
@@ -33,14 +33,14 @@ The entire workload is divided into three different repositories:
33
33
- [x] Provides a wrapper ` Logger ` for MLFlowClient.jl clients and associated
34
34
metadata; instances of this type are valid "loggers", which can be passed to MLJ
35
35
functions supporting the ` logger ` keyword argument.
36
-
36
+
37
37
- [x] Provides MLflow integration with MLJ's ` evaluate! ` /` evaluate ` method (model
38
38
** performance evaluation** )
39
39
40
40
- [x] Extends MLJ's ` MLJ.save ` method, to save trained machines as retrievable MLflow
41
41
client artifacts
42
42
43
- - [ ] Provides MLflow integration with MLJ's ` TunedModel ` wrapper (to log ** hyper-parameter
43
+ - [x ] Provides MLflow integration with MLJ's ` TunedModel ` wrapper (to log ** hyper-parameter
44
44
tuning** workflows)
45
45
46
46
- [ ] Provides MLflow integration with MLJ's ` IteratedModel ` wrapper (to log ** controlled
@@ -60,8 +60,8 @@ shell/console, run `mlflow server` to launch an mlflow service on a local server
60
60
Refer to the [ MLflow documentation] ( https://www.mlflow.org/docs/latest/index.html ) for
61
61
necessary background.
62
62
63
- We assume MLJDecisionTreeClassifier is in the user's active Julia package
64
- environment.
63
+ ** Important. ** For the examples that follow, we assume ` MLJ ` , ` MLJDecisionTreeClassifier `
64
+ and ` MLFlowClient ` are in the user's active Julia package environment.
65
65
66
66
``` julia
67
67
using MLJ # Requires MLJ.jl version 0.19.3 or higher
@@ -73,7 +73,7 @@ instance. The experiment name and artifact location are optional.
73
73
``` julia
74
74
logger = MLJFlow. Logger (
75
75
" http://127.0.0.1:5000/api" ;
76
- experiment_name= " MLJFlow test" ,
76
+ experiment_name= " test" ,
77
77
artifact_location= " ./mlj-test"
78
78
)
79
79
```
@@ -89,25 +89,54 @@ model = DecisionTreeClassifier(max_depth=4)
89
89
Now we call ` evaluate ` as usual but provide the ` logger ` as a keyword argument:
90
90
91
91
``` julia
92
- evaluate (model, X, y, resampling= CV (nfolds= 5 ), measures= [LogLoss (), Accuracy ()], logger= logger)
92
+ evaluate (
93
+ model,
94
+ X,
95
+ y,
96
+ resampling= CV (nfolds= 5 ),
97
+ measures= [LogLoss (), Accuracy ()],
98
+ logger= logger,
99
+ )
93
100
```
94
101
95
102
Navigate to "http://127.0.0.1:5000 " on your browser and select the "Experiment" matching
96
103
the name above ("MLJFlow test"). Select the single run displayed to see the logged results
97
104
of the performance evaluation.
98
105
99
106
107
+ ### Logging outcomes of model tuning
108
+
109
+ Continuing with the previous example:
110
+
111
+ ``` julia
112
+ r = range (model, :max_depth , lower= 1 , upper= 5 )
113
+ tmodel = TunedModel (
114
+ model,
115
+ tuning= Grid (),
116
+ range = r;
117
+ resampling= CV (nfolds= 9 ),
118
+ measures= [LogLoss (), Accuracy ()],
119
+ logger= logger,
120
+ )
121
+
122
+ mach = machine (tmodel, X, y) |> fit!
123
+ ```
124
+
125
+ Return to the browser page (refreshing if necessary) and you will find five more
126
+ performance evaluations logged, one for each value of ` max_depth ` evaluated in tuning.
127
+
128
+
100
129
### Saving and retrieving trained machines as MLflow artifacts
101
130
102
131
Let's train the model on all data and save the trained machine as an MLflow artifact:
103
132
104
133
``` julia
105
134
mach = machine (model, X, y) |> fit!
106
- run = MLJBase . save (logger, mach)
135
+ run = MLJ . save (logger, mach)
107
136
```
108
137
109
- Notice that in this case ` MLJBase.save ` returns a run (and instance of ` MLFlowRun ` from
110
- MLFlowClient.jl).
138
+ Notice that in this case ` MLJBase.save ` returns a run (an instance of ` MLFlowRun ` from
139
+ MLFlowClient.jl).
111
140
112
141
To retrieve an artifact we need to use the MLFlowClient.jl API, and for that we need to
113
142
know the MLflow service that our ` logger ` wraps:
@@ -129,3 +158,50 @@ We can predict using the deserialized machine:
129
158
``` julia
130
159
predict (mach2, X)
131
160
```
161
+
162
+ ### Setting a global logger
163
+
164
+ Set ` logger ` as the global logging target by running ` default_logger(logger) ` . Then,
165
+ unless explicitly overridden, all loggable workflows will log to ` logger ` . In particular,
166
+ to * suppress* logging, you will need to specify ` logger=nothing ` in your calls.
167
+
168
+ So, for example, if we run the following setup
169
+
170
+ ``` julia
171
+ using MLJ
172
+
173
+ # using a new experiment name here:
174
+ logger = MLJFlow. Logger (
175
+ " http://127.0.0.1:5000/api" ;
176
+ experiment_name= " test global logging" ,
177
+ artifact_location= " ./mlj-test"
178
+ )
179
+
180
+ default_logger (logger)
181
+
182
+ X, y = make_moons (100 ) # a table and a vector with 100 rows
183
+ DecisionTreeClassifier = @load DecisionTreeClassifier pkg= DecisionTree
184
+ model = DecisionTreeClassifier ()
185
+ ```
186
+
187
+ Then the following is automatically logged
188
+
189
+ ``` julia
190
+ evaluate (model, X, y)
191
+ ```
192
+
193
+ But the following is * not* logged:
194
+
195
+
196
+ ``` julia
197
+ evaluate (model, X, y; logger= nothing )
198
+ ```
199
+
200
+ To save a machine when a default logger is set, one can use the following syntax:
201
+
202
+ ``` julia
203
+ mach = machine (model, X, y) |> fit!
204
+ MLJ. save (mach)
205
+ ```
206
+
207
+ Retrieve the saved machine as described earlier.
0 commit comments