Merge pull request #45 from JuliaAI/default-logger

ablaom · web-flow · commit 03ad805833d2 · 2024-08-02T13:54:48.000+12:00
Update the readme examples to include tuning and setting global logger
diff --git a/.github/workflows/CI.yml b/.github/workflows/CI.yml
@@ -60,7 +60,10 @@ jobs:
           JULIA_NUM_THREADS: '2'
           MLFLOW_TRACKING_URI: "http://localhost:5000/api"
       - uses: julia-actions/julia-processcoverage@v1
-      - uses: codecov/codecov-action@v3
+      - uses: codecov/codecov-action@v4
         with:
-          files: lcov.info
+          token: ${{ secrets.CODECOV_TOKEN }}
+          fail_ci_if_error: false
+          verbose: true 
+          
 
diff --git a/README.md b/README.md
@@ -6,8 +6,8 @@
 
 [ci-dev]: https://github.com/pebeto/MLJFlow.jl/actions/workflows/CI.yml
 [ci-dev-img]: https://github.com/pebeto/MLJFlow.jl/actions/workflows/CI.yml/badge.svg?branch=dev "Continuous Integration (CPU)"
-[codecov-dev]: https://codecov.io/github/JuliaAI/MLJFlow.jl?branch=dev
-[codecov-dev-img]: https://codecov.io/gh/JuliaAI/MLJFlow.jl/branch/dev/graphs/badge.svg?branch=dev "Code Coverage"
+[codecov-dev]: https://codecov.io/github/JuliaAI/MLJFlow.jl
+[codecov-dev-img]: https://codecov.io/github/JuliaAI/MLJFlow.jl/graph/badge.svg?token=TBCMJOK1WR "Code Coverage"
 
 [MLJ](https://github.com/alan-turing-institute/MLJ.jl) is a Julia framework for
 combining and tuning machine learning models. MLJFlow is a package that extends
@@ -22,7 +22,7 @@ metrics, log parameters, log artifacts, etc.).
 This project is part of the GSoC 2023 program. The proposal description can be
 found [here](https://summerofcode.withgoogle.com/programs/2023/projects/iRxuzeGJ).
 The entire workload is divided into three different repositories:
-[MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl), 
+[MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl),
 [MLFlowClient.jl](https://github.com/JuliaAI/MLFlowClient.jl) and this one.
 
 ## Features
@@ -33,14 +33,14 @@ The entire workload is divided into three different repositories:
 - [x] Provides a wrapper `Logger` for MLFlowClient.jl clients and associated
       metadata; instances of this type are valid "loggers", which can be passed to MLJ
       functions supporting the `logger` keyword argument.
-	  
+
 - [x] Provides MLflow integration with MLJ's `evaluate!`/`evaluate` method (model
       **performance evaluation**)
 
 - [x] Extends MLJ's `MLJ.save` method, to save trained machines as retrievable MLflow
       client artifacts
 
-- [ ] Provides MLflow integration with MLJ's `TunedModel` wrapper (to log **hyper-parameter
+- [x] Provides MLflow integration with MLJ's `TunedModel` wrapper (to log **hyper-parameter
       tuning** workflows)
 
 - [ ] Provides MLflow integration with MLJ's `IteratedModel` wrapper (to log **controlled
@@ -60,8 +60,8 @@ shell/console, run `mlflow server` to launch an mlflow service on a local server
 Refer to the [MLflow documentation](https://www.mlflow.org/docs/latest/index.html) for
 necessary background.
 
-We assume MLJDecisionTreeClassifier is in the user's active Julia package
-environment.
+**Important.** For the examples that follow, we assume `MLJ`, `MLJDecisionTreeClassifier`
+and `MLFlowClient` are in the user's active Julia package environment.
 
 ```julia
 using MLJ # Requires MLJ.jl version 0.19.3 or higher
@@ -73,7 +73,7 @@ instance. The experiment name and artifact location are optional.
 ```julia
 logger = MLJFlow.Logger(
     "http://127.0.0.1:5000/api";
-    experiment_name="MLJFlow test",
+    experiment_name="test",
     artifact_location="./mlj-test"
 )
 ```
@@ -89,25 +89,54 @@ model = DecisionTreeClassifier(max_depth=4)
 Now we call `evaluate` as usual but provide the `logger` as a keyword argument:
 
 ```julia
-evaluate(model, X, y, resampling=CV(nfolds=5), measures=[LogLoss(), Accuracy()], logger=logger)
+evaluate(
+    model,
+    X,
+    y,
+    resampling=CV(nfolds=5),
+    measures=[LogLoss(), Accuracy()],
+    logger=logger,
+)
 ```
 
 Navigate to "http://127.0.0.1:5000" on your browser and select the "Experiment" matching
 the name above ("MLJFlow test"). Select the single run displayed to see the logged results
 of the performance evaluation.
 
 
+### Logging outcomes of model tuning
+
+Continuing with the previous example:
+
+```julia
+r = range(model, :max_depth, lower=1, upper=5)
+tmodel = TunedModel(
+    model,
+    tuning=Grid(),
+    range = r;
+    resampling=CV(nfolds=9),
+    measures=[LogLoss(), Accuracy()],
+    logger=logger,
+)
+
+mach = machine(tmodel, X, y) |> fit!
+```
+
+Return to the browser page (refreshing if necessary) and you will find five more
+performance evaluations logged, one for each value of `max_depth` evaluated in tuning.
+
+
 ### Saving and retrieving trained machines as MLflow artifacts
 
 Let's train the model on all data and save the trained machine as an MLflow artifact:
 
 ```julia
 mach = machine(model, X, y) |> fit!
-run = MLJBase.save(logger, mach)
+run = MLJ.save(logger, mach)
 ```
 
-Notice that in this case `MLJBase.save` returns a run (and instance of `MLFlowRun` from
-MLFlowClient.jl). 
+Notice that in this case `MLJBase.save` returns a run (an instance of `MLFlowRun` from
+MLFlowClient.jl).
 
 To retrieve an artifact we need to use the MLFlowClient.jl API, and for that we need to
 know the MLflow service that our `logger` wraps:
@@ -129,3 +158,50 @@ We can predict using the deserialized machine:
 ```julia
 predict(mach2, X)
 ```
+
+### Setting a global logger
+
+Set `logger` as the global logging target by running `default_logger(logger)`. Then,
+unless explicitly overridden, all loggable workflows will log to `logger`. In particular,
+to *suppress* logging, you will need to specify `logger=nothing` in your calls.
+
+So, for example, if we run the following setup
+
+```julia
+using MLJ
+
+# using a new experiment name here:
+logger = MLJFlow.Logger(
+    "http://127.0.0.1:5000/api";
+    experiment_name="test global logging",
+    artifact_location="./mlj-test"
+)
+
+default_logger(logger)
+
+X, y = make_moons(100) # a table and a vector with 100 rows
+DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
+model = DecisionTreeClassifier()
+```
+
+Then the following is automatically logged
+
+```julia
+evaluate(model, X, y)
+```
+
+But the following is *not* logged:
+
+
+```julia
+evaluate(model, X, y; logger=nothing)
+```
+
+To save a machine when a default logger is set, one can use the following syntax:
+
+```julia
+mach = machine(model, X, y) |> fit!
+MLJ.save(mach)
+```
+
+Retrieve the saved machine as described earlier.