MLflow
MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It includes features for tracking experiments, packaging code into reproducible runs, and sharing and deploying models.
Install MLflow
Before you can leverage MLflow's functionality you need to install MLflow in your environment, such as a local machine, VM, or cloud provider environment.
You can use pip to install MLflow:
pip install mlflow
Track experiments with MLflow
One of MLFlow's core components is MLflow Tracking, which allows you to log parameters, code versions, metrics, and output files when running your data science code and later visualize them.
MLflow Tracking provides a flexible and easy-to-use approach to log and compare parameters, metrics, and models across experiments. By adopting MLflow for experiment tracking, you can significantly enhance the reproducibility, collaboration, and monitoring of your machine learning projects.
To use MLflow Tracking:
-
Initialize MLflow Tracking.
To track experiments, you can use the
mlflow.start_run()
method in your code. Within this context, you can log parameters, metrics, models, and artifacts.This example usd the
mlflow.start_run()
method in a Python script:import mlflow
# Start an MLflow run
with mlflow.start_run():
# Log parameters (key-value pairs)
mlflow.log_param("param_name", "param_value")
# Log metrics (key-value pairs)
mlflow.log_metric("metric_name", metric_value)
# Log artifacts (output files)
# Ensure you have the file you want to log in the current directory
mlflow.log_artifact("output_file.txt") -
Run your code. After integrating MLflow tracking into your code, run your script as you normally would.
MLflow automatically logs all the parameters, metrics, and artifacts you've specified.
Use MLflow Tracking in Harness
Training can be done in Harness or using native integrations with popular data science platforms, such as MLflow.
You can include MLflow Tracking in a Harness CI pipeline by using the MLflow plugin.
- Install MLflow.
- Add the MLflow plugin in a Plugin step.
- step:
type: Plugin
name: mlflow plugin
identifier: maven_plugin
spec:
connectorRef: account.harnessImage ## Harness Docker connector
image: harnesscommunity/mlflow
settings:
MLFLOW_TRACKING_URI: http://12.345.678.900:5000 ## URI for your MLflow remote tracking server
MLFLOW_EXPERIMENT_NAME: someExperimentName
MLFLOW_PROJECT_PATH: https://github.com/someAccount/mlflow-example-project
MLFLOW_RUN_PARAMETERS: n_estimators=150
imagePullPolicy: Always
You can use expressions for plugin settings. For example, <+stage.variables.trackingUri>
references a stage variable. You can also create text secrets for sensitive information, such as passwords, and then use expressions to reference those secrets.
Advanced tracking
In addition to standard MLflow Tracking, you can enable these advanced tracking options:
Log models
You can log models in a format that can later be deployed for serving. In this example sk_model
is a trained Scikit-Learn model, and "model"
is the artifact path for this model in the MLflow run.
mlflow.sklearn.log_model(sk_model, "model")
Use MLflow Projects and Models
For more structured experimentation, consider packaging your code as an MLflow Project and using MLflow Models for model packaging and deployment.
Remote Tracking Server
For team environments and in Harness pipelines, consider setting up a remote tracking server that all team members can access instead of using the local file system. MLflow supports various backend stores for tracking, such as a SQL database, and artifact stores like S3, Azure Blob Storage, or Google Cloud Storage.
To configure MLflow to use a remote server, set the MLFLOW_TRACKING_URI
environment variable:
export MLFLOW_TRACKING_URI='http://your-tracking-server:5000'
Or set it within your Python code with the mlflow.set_tracking_uri()
method:
mlflow.set_tracking_uri('http://your-tracking-server:5000')
View tracking results
You can view MLflow Tracking results in the MLflow UI. To start the MLflow UI run mlflow ui
.
By default, this UI runs on http://127.0.0.1:5000
, unless you are serving results to a remote tracking server. Navigate to your UI or tracking server URL in a web browser to view your experiments, navigate through recorded runs, compare metrics, and visualize parameters and outputs.