Track ML Experiment using MLflow

Aji Samudra
Analytics Vidhya
Published in
4 min readJan 29, 2021

--

Photo by Robin Glauser on Unsplash

Problem

Finding high performing machine learning model in the context of a data science competition could be a mess.

One way to gain knowledge is by applying it and I believe Kaggle is one of the best places to apply & upskill your modeling skills.

Typical Kaggle Competition set-up

In Kaggle, typical supervised learning competition set-up is that participants are given:

  1. two separate datasets i.e. train and test set,
  2. one single evaluation metric which then used to rank model performance,
  3. two separate leaderboards i.e. Public & Private leaderboards. Some proportion of test set is used to calcualte Public leaderboard while the rest for Private leaderboard. Private leaderboard remains secret until the end of competition and it will determine the final ranking.

A LOT of Possible Experiment Ideas

In the case of a data science competition, you might start with a simple model, then as you go and have a better understanding of the problem you might want to build the more complex model. In the process of adding complexity, you might lose track and don’t know which ideas better than others. Why is that the case? Because there are a lot of components that we could experiment with e.g. different learning algorithms, unique and various hyperparameter for each learning algorithm, not to mention preprocess techniques that we could apply to the data! All the more, you might also want to experiment with building Ensemble and/or Stacking different models to achieve the highest rank on the leaderboard!

few possible components to experiment with

In this article, we will see how to infuse tracking tools in the ML experiment workflow.

ML Experiment

In order to have standardized performance metrics accross all experiments, it is easier to frame the process as separate building block as seen below. It enables you to separate focus on several components along the pipeline. You might focus on preprocess and feature engineering first, then continue on the applying different algorithms in the same preprocessed data, and so on.

Aiming high leaderboard score means that first, you need to understand whether your model suffers from overfitting and underfitting, then apply different techniques to address it.

If the model suffers from overfitting, you might apply regularization to the model, reduce/remove features, implement early stopping, or build ensemble model.

If the model suffers from underfitting, you might add more relevant features and training data *if possible, increase model complexity, or remove noise in the data.

MLflow

MLflow is an open-source platform for the machine learning lifecycle *quoted from its site. MLflow provides a range of functions including experimentation, reproducibility, deployment, and a central model registry. There are 4 MLflow components as seen below and we will only focus on MLflow Tracking.

MLflow Tracking

There are 3 main functions

  • log_param : for tracking what components you changed
  • log_metric : for tracking evaluation/performance metrics
  • log_artifact : for tracking images to help you evaluate the model or even out-of-fold predictions for building ensemble/stacking models later

Case: Solving MNIST dataset

Objective of this experiment is to show how to infuse experiment tracking while approaching famous MNIST dataset with several algorithms.
For simplicity, we will use default hyperparameter for each algorithm so we will only have 1 parameter changed i.e. “model”. Since evaluation and tracking process will be repeated for each algorithm, it will be cleaner if we implement it as functions.

Library

Evaluation Function

Tracking Function

Pre-process & Validation Strategy

Before feeding training dataset to different algorithm, we need to perform pre-processing which is normalize the value from 0–255 to 0–1 so it will fasten the convergence time for Logistic Regression, MLP and CNN.

Before Experiment

We need to define experiment_id before experimenting

# create experiment id for tracking using mlflow
exp_id = mlflow.create_experiment("solving-mnist1")

Fit and Predict Function

Model 1: Logistic Regression — Sklearn

Model 2: Random Forest — Sklearn

Model 3: Multi Layers Perceptron — Keras

Model 4: Convolutional Neural Network — Keras

MLflow Dashboard

We could now compare several experimentations in MLflow UI. It is cleaner and easier than keep experiments tracking in spreadsheet or even notes.

# run mlflow ui
!mlflow ui
comparing accuracy from different models

Conclusion

MLflow helps us to reduce clutter in ML experiment tracking and it is easy to infuse in typical workflow. Full notebook for this in my github.

There is one good tracking tool that you might also try! Weights & Biases.

--

--