MLFlow Wine quality

In this tutorial you step by step will learn how to train, pack and deploy your model from scratch by using Legion Platform.

Requirements

You must have the Legion Platform deployed in a cluster;
:ref:`MLFlow <mod_dev_using_mlflow-section>` and :term:`docker <Docker REST API Packaging Toolchain Integration>` toolchain integrations must be installed;
:term:`Legion CLI` is installed locally or :term:`Plugin for JupyterLab` is installed locally or in the cloud;
You should be authorized at :ref:`edi-server-description`;

Tutorial

To train, pack and deploy model you need to interact with :ref:`edi-server-description` server. This server provides REST API. You can use it directly or using different tools.

You have two options for such tools to complete this tutorial:

With using :term:`Legion CLI` command-line tool;
With using :term:`Plugin for JupyterLab`;

In this tutorial, you will learn how-to:

:ref:`Create MLFlow project <tutorials_wine-create-project>`;
:ref:`Manage connections for the project <tutorials_wine-manage-connections>`;
:ref:`Train a model of the project <tutorials_wine-train>`;
:ref:`Pack the trained model <tutorials_wine-pack>`;
:ref:`Deploy the packed model <tutorials_wine-deploy>`;
:ref:`Use the deployed model <tutorials_wine-use>`;

This tutorial uses a dataset to predict the quality of the wine based on quantitative features like the wine’s "fixed acidity", "pH", "residual sugar", and so on. The dataset is from UCI’s machine learning repository.

The final code can be found at GitHub.

Create MLFlow project

Step input data	System with completed :ref:`requirements<tutorials_wine-req>`
Step output data	Folder with MLFlow project to predict wine quality

Create a new project folder:

$ mkdir wine && cd wine

Create our training script:

$ touch train.py

Paste next code to the created file:

 import os
 import warnings
 import sys
 import argparse

 import pandas as pd
 import numpy as np
 from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
 from sklearn.model_selection import train_test_split
 from sklearn.linear_model import ElasticNet

 import mlflow
 import mlflow.sklearn


 def eval_metrics(actual, pred):
     rmse = np.sqrt(mean_squared_error(actual, pred))
     mae = mean_absolute_error(actual, pred)
     r2 = r2_score(actual, pred)
     return rmse, mae, r2



 if __name__ == "__main__":
     warnings.filterwarnings("ignore")
     np.random.seed(40)

     parser = argparse.ArgumentParser()
     parser.add_argument('--alpha')
     parser.add_argument('--l1-ratio')
     args = parser.parse_args()

     # Read the wine-quality csv file (make sure you're running this from the root of MLflow!)
     wine_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "wine-quality.csv")
     data = pd.read_csv(wine_path)

     # Split the data into training and test sets. (0.75, 0.25) split.
     train, test = train_test_split(data)

     # The predicted column is "quality" which is a scalar from [3, 9]
     train_x = train.drop(["quality"], axis=1)
     test_x = test.drop(["quality"], axis=1)
     train_y = train[["quality"]]
     test_y = test[["quality"]]

     alpha = float(args.alpha)
     l1_ratio = float(args.l1_ratio)

     with mlflow.start_run():
         lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
         lr.fit(train_x, train_y)

         predicted_qualities = lr.predict(test_x)

         (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

         print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
         print("  RMSE: %s" % rmse)
         print("  MAE: %s" % mae)
         print("  R2: %s" % r2)

         mlflow.log_param("alpha", alpha)
         mlflow.log_param("l1_ratio", l1_ratio)
         mlflow.log_metric("rmse", rmse)
         mlflow.log_metric("r2", r2)
         mlflow.log_metric("mae", mae)
         mlflow.set_tag("test", '13')

         mlflow.sklearn.log_model(lr, "model")

         # Persist samples (input and output)
         train_x.head().to_pickle('head_input.pkl')
         mlflow.log_artifact('head_input.pkl', 'model')
         train_y.head().to_pickle('head_output.pkl')
         mlflow.log_artifact('head_output.pkl', 'model')

In this file, we do:

Starting run context on line 46;
Training ElasticNet model on line 48;
Setting metrics, parameters and tags on lines 59-64;
Saving (through serialization) model with name model on line 66;
Saving input and output samples (for persisting information about input and output column names) on lines 69-72;

Create MLproject file:

$ touch MLproject

Paste next code to the created file:

name: wine-quality-example
conda_env: conda.yaml
entry_points:
    main:
        parameters:
            alpha: float
            l1_ratio: {type: float, default: 0.1}
        command: "python train.py --alpha {alpha} --l1-ratio {l1_ratio}"

Note

Read more about MLproject structure at Official MLFlow docs.

Create conda environment file:

$ touch conda.yaml

Paste next code to the created file:

 name: example
 channels:
 - defaults
 dependencies:
 - python=3.6
 - numpy=1.14.3
 - pandas=0.22.0
 - scikit-learn=0.19.1
 - pip:
     - mlflow==1.0.0

Note

All packages that tools that are used in training script must be listed at conda.yaml file.

Read more about conda environment at Official conda docs.

Download wine data set:

$ mkdir ./data
$ wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv -O ./data/wine-quality.csv

After this step project folder structure should look next way:

.
├── MLproject
├── conda.yaml
├── data
│   └── wine-quality.csv
└── train.py

Manage connections

Step input data	System with completed :ref:`requirements<tutorials_wine-req>`
Step output data	Created :term:`connections<Connection>`

As mentioned before Legion Platform uses concept of :term:`Connections<Connection>` to manage different kinds of data and other external services.

To complete this tutorial we will need next connections:

:term:`Connection` to VCS repository where MLFlow project for wine classification is located
:term:`Connection` to wine-quality.csv file in one of supported object storage
:term:`Connection` to docker registry where the packed model will be stored

Create :term:`Connection` to VCS repository

Because legion-examples repository already contains the required code we will just use this repository. But feel free to create and use a new repository if you want.

Create a directory where we will create all payloads for the Legion Platform API calls:

$ mkdir ./legion

Create payload:

$ touch ./legion/vcs_connection.legion.yaml

Paste next code to the created file:

 kind: Connection
 id: legion-examples
 spec:
   type: git
   uri: git@github.com:legion-platform/legion-examples.git
   reference: origin/master
   keySecret: <paste here your key github ssh key>
   description: Git repository with legion-examples
   webUILink: https://github.com/legion-platform/legion-examples

Note

Create :term:`Connection` to wine-quality.csv object storage

Create payload:

$ touch ./legion/wine_connection.legion.yaml

Paste next code to the created file:

 kind: Connection
 id: wine
 spec:
   type: gcs
   uri: gs://<paste your bucket address here>/data/wine-quality.csv
   region: <paste region here>
   keySecret: <paste key secret here>
   description: Wine dataset

Create a connection using :term:`Legion CLI` or :term:`Plugin for JupyterLab` as in the previous example.

If wine-quality.csv is not persisted in store yet, you can copy it using:

$ gsutil cp ./data/wine-quality.csv gs://<bucket-name>/data/

Create :term:`Connection` to docker registry

Create payload:

$ touch ./legion/docker_connection.legion.yaml

Paste next code to the created file:

 kind: Connection  # type of payload
 id: docker-ci
 spec:
   type: docker
   uri: <past uri of your registry here>  # uri to docker image registry
   username: <paste your username here>
   password: <paste your password here>
   description: Docker registry for model packaging

Create the connection using :term:`Legion CLI` or :term:`Plugin for JupyterLab` as in the previous example.

Check all created connections:

$ legionctl conn get | grep -e id: -e type: -e description

- id: docker-ci
    description: Docker repository for model packaging
    type: docker
- id: legion-examples
    description: Git repository with legion-examples
    type: git
- id: models-output
    description: Storage for trainined artifacts
    type: gcs
- id: wine
    description: Wine dataset
    type: gcs

Congrats! Now you are ready to train your model!

Train a model of the project

Step input data	Folder with MLFlow project to predict wine quality
Step output data	The trained model in :term:`GPPI<General Python Prediction Interface>` :term:`Trained Model Binary Format`

Create payload:

$ touch ./legion/training.legion.yaml

Paste next code to the created file:

 kind: ModelTraining
 id: wine
 spec:
   model:
     name: wine
     version: 1.0
   toolchain: mlflow  # MLFlow training toolchain integration
   entrypoint: main
   workDir: mlflow/sklearn/wine  # directory where MLproject file is located
   data:
     - connName: wine
       localPath: mlflow/sklearn/wine/wine-quality.csv # where wine-quality.csv file from GCS should be fetched
   hyperParameters:
     alpha: "1.0"
   resources:
     limits:
       cpu: 4024m
       memory: 4024Mi
     requests:
       cpu: 2024m
       memory: 2024Mi
   vcsName: legion-examples

In this file, we do:

line 7: legion toolchain's name should be set to :ref:`mlflow <mod_dev_using_mlflow-section>`;
line 8: legion training's entry point maps to entry_points, declared in :ref:`MLproject file`. We use main;
line 9: workDir point to MLFlow project directory (It is the directory that has :ref:`MLproject file` at the root level);
line 10 section that describes where Legion Platform should take data and where this data should be downloaded;
line 11: connName points to the id of :ref:`Wine connection` that we created before;
line 12: localPath points to the path where the file with wine data should be downloaded;
lines 13-14: training's hyperparameters maps to MLflow run parameters. l1_ratio will be set to a default value;
line 22: vcsName should be equal to id of :ref:`VCS Connection`;

Create :term:`Model Training` using :term:`Legion CLI`:

$ legionctl conn create -f ./legion/training.legion.yaml

Check :term:`Model Training` logs:

$ legionctl training logs --id wine

After some time :term:`Model Training` will be finished.

To check status run:

$ legionctl training get --id wine

You will see YAML with an updated ModelTraining resource. Look at the status section. You can see:

state succeeded (this means that model training process was successful)
artifactName (this is the filename of :term:`Trained Model Binary`)

Or create training using :term:`Plugin for JupyterLab`:

Open jupyterlab;
Open cloned repo, and then the folder with the project;
Select file ./legion/training.legion.yaml and in context menu press submit button;

You can see model logs using Legion cloud mode left side tab (cloud icon) in your Jupyterlab:

Open Legion cloud mode tab;
Look for TRAINING section;
Press on the row with ID=wine;
Press button LOGS to connect to :term:`Model Training` logs;

After some time :term:`Model Training` will be finished. Status of training is updated in column status of the TRAINING section in the Legion cloud mode tab. If model training finished with success you will see status=succeeded.

Then open :term:`Model Training` again by pressing the appropriate row. Look at the Results section. You can see:

artifactName (this is the filename of :term:`Trained Model Binary`)

artifactName is the filename of the trained model. Our model is stored in :term:`GPPI<General Python Prediction Interface>` format. We can download it from storage that is described in models-output connection (currently this connection is created on the Legion Platform installation stage, so we have not created this connection above).

Pack the trained model

Step input data	The trained model in :term:`GPPI<General Python Prediction Interface>` :term:`Trained Model Binary Format`
Step output data	The packed model as Docker image with REST API

Create payload:

$ touch ./legion/packaging.legion.yaml

Paste next code to the created file:

 id: wine
 kind: ModelPackaging
 spec:
   artifactName: "<fill-in>"  # set artifact name from previous step;
   targets:
     - connectionName: docker-ci  # set docker repository connection where our packaged model will be saved
       name: docker-push
   integrationName: docker-rest  # set Model packaging toolchain integration as rest service

In this file, we do:

line 4: Set artifact name from the previous step;
line 6: Set target docker registry to id from :ref:`Docker connection` file;
line 7: Set target command for the packager;
line 8: Set id of :term:`Docker REST API Packaging Toolchain Integration`;

Create :term:`Model Packaging` using :term:`Legion CLI`:

$ legionctl conn create -f ./legion/packaging.legion.yaml

Check :term:`Model Packaging` logs:

$ legionctl packaging logs --id wine

After some time :term:`Model Packaging` will be finished.

To check status run:

$ legionctl packaging get --id wine

You will see YAML with updated :term:`Model Packaging` resource. Look at the status section. You can see:

image (this is the filename of docker image in the registry with the trained model as a REST service`);

Or create packaging using :term:`Plugin for JupyterLab`:

Open jupyterlab;
Open cloned repo, and then the folder with the project;
Select file ./legion/packaging.legion.yaml and in context menu press submit button;

You can see model logs using Legion cloud mode side tab in your Jupyterlab

Open Legion cloud mode tab;
Look for PACKAGING section;
Press on the row with ID=wine;
Press button LOGS to connect to :term:`Model Packaging` logs;

After some time :term:`Model Packaging` will be finished. Status of training is updated in column status of the PACKAGING section in the Legion cloud mode tab. If model training finished with success you will see status=succeeded.

Then open :term:`Model Packaging` again by pressing the appropriate row. Look at the Results section. You can see:

image (this is the filename of docker image in the registry with the trained model as a REST service`);

Deploy the packed model

Step input data	The packed model as Docker image with REST API
Step output data	The deployed model

Create payload:

$ touch ./legion/deployment.legion.yaml

Paste next code to the created file:

 id: wine
 kind: ModelDeployment
 spec:
   image: "<fill-in>"
   minReplicas: 1
   ImagePullConnectionID: docker-ci

In this file, we do:

line 4: Set image that we got on the previous step;
line 6: Set id of :term:`Docker REST API Packaging Toolchain Integration`;

Create :term:`Model Deploying` using :term:`Legion CLI`:

$ legionctl conn create -f ./legion/deployment.legion.yaml

After some time :term:`Model Deploying` will be finished.

To check status run:

$ legionctl deployment get --id wine

Or create packaging using :term:`Plugin for JupyterLab`:

Open jupyterlab;
Open cloned repo, and then the folder with the project;
Select file ./legion/deployment.legion.yaml and in context menu press submit button;

You can see model logs using Legion cloud mode side tab in your Jupyterlab

Open Legion cloud mode tab;
Look for DEPLOYMENT section;
Press on the row with ID=wine;

After some time :term:`Model Deploying` will be finished. Status of training is updated in column status of the DEPLOYMENT section in the Legion cloud mode tab. If model training finished with success you will see status=Ready

Use the deployed model

Step input data	The deployed model

After the model is successfully deployed you can check its API in swagger.

Just open edge.<your-legion-platform-host>/swagger/index.html and look and next endpoints

GET /model/wine/api/model/info – OpenAPI model specification;
POST /model/wine/api/model/invoke – Endpoint to do predictions;

But you can also do predictions using :term:`Legion CLI`.

Create ./legion/r.json file:

$ touch ./legion/r.json

Add payload for /model/wine/api/model/invoke according to OpenAPI schema. In this payload we list model input variables:

{
  "columns": [
    "fixed acidity",
    "volatile acidity",
    "citric acid",
    "residual sugar",
    "chlorides",
    "free sulfur dioxide",
    "total sulfur dioxide",
    "density",
    "pH",
    "sulphates",
    "alcohol"
  ],
  "data": [
    [
      7,
      0.27,
      0.36,
      20.7,
      0.045,
      45,
      170,
      1.001,
      3,
      0.45,
      8.8
    ]
  ]
}

Invoke the model to make a prediction:

$ legionctl model invoke --mr wine --json-file r.json

{"prediction": [6.0], "columns": ["quality"]}

Congrats! You have finished the tutorial!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!