Skip to content

Commit

Permalink
Merge pull request #45 from JuliaAI/default-logger
Browse files Browse the repository at this point in the history
Update the readme examples to include tuning and setting global logger
  • Loading branch information
ablaom authored Aug 2, 2024
2 parents 76352d5 + 4f92cc2 commit 03ad805
Show file tree
Hide file tree
Showing 2 changed files with 93 additions and 14 deletions.
7 changes: 5 additions & 2 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,10 @@ jobs:
JULIA_NUM_THREADS: '2'
MLFLOW_TRACKING_URI: "http://localhost:5000/api"
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v3
- uses: codecov/codecov-action@v4
with:
files: lcov.info
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: false
verbose: true


100 changes: 88 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@

[ci-dev]: https://github.com/pebeto/MLJFlow.jl/actions/workflows/CI.yml
[ci-dev-img]: https://github.com/pebeto/MLJFlow.jl/actions/workflows/CI.yml/badge.svg?branch=dev "Continuous Integration (CPU)"
[codecov-dev]: https://codecov.io/github/JuliaAI/MLJFlow.jl?branch=dev
[codecov-dev-img]: https://codecov.io/gh/JuliaAI/MLJFlow.jl/branch/dev/graphs/badge.svg?branch=dev "Code Coverage"
[codecov-dev]: https://codecov.io/github/JuliaAI/MLJFlow.jl
[codecov-dev-img]: https://codecov.io/github/JuliaAI/MLJFlow.jl/graph/badge.svg?token=TBCMJOK1WR "Code Coverage"

[MLJ](https://github.com/alan-turing-institute/MLJ.jl) is a Julia framework for
combining and tuning machine learning models. MLJFlow is a package that extends
Expand All @@ -22,7 +22,7 @@ metrics, log parameters, log artifacts, etc.).
This project is part of the GSoC 2023 program. The proposal description can be
found [here](https://summerofcode.withgoogle.com/programs/2023/projects/iRxuzeGJ).
The entire workload is divided into three different repositories:
[MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl),
[MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl),
[MLFlowClient.jl](https://github.com/JuliaAI/MLFlowClient.jl) and this one.

## Features
Expand All @@ -33,14 +33,14 @@ The entire workload is divided into three different repositories:
- [x] Provides a wrapper `Logger` for MLFlowClient.jl clients and associated
metadata; instances of this type are valid "loggers", which can be passed to MLJ
functions supporting the `logger` keyword argument.

- [x] Provides MLflow integration with MLJ's `evaluate!`/`evaluate` method (model
**performance evaluation**)

- [x] Extends MLJ's `MLJ.save` method, to save trained machines as retrievable MLflow
client artifacts

- [ ] Provides MLflow integration with MLJ's `TunedModel` wrapper (to log **hyper-parameter
- [x] Provides MLflow integration with MLJ's `TunedModel` wrapper (to log **hyper-parameter
tuning** workflows)

- [ ] Provides MLflow integration with MLJ's `IteratedModel` wrapper (to log **controlled
Expand All @@ -60,8 +60,8 @@ shell/console, run `mlflow server` to launch an mlflow service on a local server
Refer to the [MLflow documentation](https://www.mlflow.org/docs/latest/index.html) for
necessary background.

We assume MLJDecisionTreeClassifier is in the user's active Julia package
environment.
**Important.** For the examples that follow, we assume `MLJ`, `MLJDecisionTreeClassifier`
and `MLFlowClient` are in the user's active Julia package environment.

```julia
using MLJ # Requires MLJ.jl version 0.19.3 or higher
Expand All @@ -73,7 +73,7 @@ instance. The experiment name and artifact location are optional.
```julia
logger = MLJFlow.Logger(
"http://127.0.0.1:5000/api";
experiment_name="MLJFlow test",
experiment_name="test",
artifact_location="./mlj-test"
)
```
Expand All @@ -89,25 +89,54 @@ model = DecisionTreeClassifier(max_depth=4)
Now we call `evaluate` as usual but provide the `logger` as a keyword argument:

```julia
evaluate(model, X, y, resampling=CV(nfolds=5), measures=[LogLoss(), Accuracy()], logger=logger)
evaluate(
model,
X,
y,
resampling=CV(nfolds=5),
measures=[LogLoss(), Accuracy()],
logger=logger,
)
```

Navigate to "http://127.0.0.1:5000" on your browser and select the "Experiment" matching
the name above ("MLJFlow test"). Select the single run displayed to see the logged results
of the performance evaluation.


### Logging outcomes of model tuning

Continuing with the previous example:

```julia
r = range(model, :max_depth, lower=1, upper=5)
tmodel = TunedModel(
model,
tuning=Grid(),
range = r;
resampling=CV(nfolds=9),
measures=[LogLoss(), Accuracy()],
logger=logger,
)

mach = machine(tmodel, X, y) |> fit!
```

Return to the browser page (refreshing if necessary) and you will find five more
performance evaluations logged, one for each value of `max_depth` evaluated in tuning.


### Saving and retrieving trained machines as MLflow artifacts

Let's train the model on all data and save the trained machine as an MLflow artifact:

```julia
mach = machine(model, X, y) |> fit!
run = MLJBase.save(logger, mach)
run = MLJ.save(logger, mach)
```

Notice that in this case `MLJBase.save` returns a run (and instance of `MLFlowRun` from
MLFlowClient.jl).
Notice that in this case `MLJBase.save` returns a run (an instance of `MLFlowRun` from
MLFlowClient.jl).

To retrieve an artifact we need to use the MLFlowClient.jl API, and for that we need to
know the MLflow service that our `logger` wraps:
Expand All @@ -129,3 +158,50 @@ We can predict using the deserialized machine:
```julia
predict(mach2, X)
```

### Setting a global logger

Set `logger` as the global logging target by running `default_logger(logger)`. Then,
unless explicitly overridden, all loggable workflows will log to `logger`. In particular,
to *suppress* logging, you will need to specify `logger=nothing` in your calls.

So, for example, if we run the following setup

```julia
using MLJ

# using a new experiment name here:
logger = MLJFlow.Logger(
"http://127.0.0.1:5000/api";
experiment_name="test global logging",
artifact_location="./mlj-test"
)

default_logger(logger)

X, y = make_moons(100) # a table and a vector with 100 rows
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
model = DecisionTreeClassifier()
```

Then the following is automatically logged

```julia
evaluate(model, X, y)
```

But the following is *not* logged:


```julia
evaluate(model, X, y; logger=nothing)
```

To save a machine when a default logger is set, one can use the following syntax:

```julia
mach = machine(model, X, y) |> fit!
MLJ.save(mach)
```

Retrieve the saved machine as described earlier.

0 comments on commit 03ad805

Please sign in to comment.