Reproducible Runs with MLflow Projects on Azure Databricks


This section describes MLflow features that are in Private Preview. To request access to the preview, contact your Azure Databricks sales representative. If you are not participating in the preview, see the MLflow open-source documentation for information on how to run standalone MLflow.

An MLflow Project is a format for packaging data science code in a reusable and reproducible way. The MLflow Projects component includes an API and command-line tools for running projects, which also integrate with the Tracking component to automatically record the parameters and git commit of your source code for reproducibility. This topic describes how to run an MLflow project remotely on Azure Databricks clusters using the MLflow CLI, which makes it easy to vertically scale your data science code.

To get started with MLflow projects, see the MLflow App Library, which contains a repository of ready-to-run projects aimed at making it easy to include ML functionality into your code.

Run an MLflow project

To run an MLflow project on an Azure Databricks cluster in the default workspace, use the command:

mlflow run <uri> -m databricks --cluster-spec <json-cluster-spec>

where <uri> is a Git repository URI or folder containing an MLflow project and <json-cluster-spec> is a JSON document containing a cluster specification.

An example cluster specification is:

  "spark_version": "5.0.x-scala2.11",
  "num_workers": 1,
  "node_type_id": "Standard_DS3_v2"


If you are using Databricks Runtime 4.3 or lower, you must specify the following spark_conf in your cluster specification:

  "spark_version": "5.0.x-scala2.11",
  "num_workers": 1,
  "node_type_id": "Standard_DS3_v2",
  "spark_conf": {"spark.databricks.chauffeur.shellCommandTask.enabled": "true"}

You can pass Git credentials using the git-username and git-password arguments or the MLFLOW_GIT_USERNAME and MLFLOW_GIT_PASSWORD environment variables.

To run against an Azure Databricks cluster in a non-default workspace, specify databricks://<profile>, where <profile> is a Databricks CLI profile, in the MLFLOW_TRACKING_URI environment variable.


The API for running projects, mlflow.start_run(), accepts a source_name argument. This argument is used if you run a project from a file, but is ignored if you run from a Azure Databricks notebook or using the CLI command mlflow run.


This example shows how to run the MLflow tutorial project on an Azure Databricks cluster, view the job run output, and view the run in the MLflow UI.

Run the MLflow tutorial project

The following command runs the MLflow tutorial project, training a wine model, and records the training parameters and metrics in MLflow experiment 49 on a workspace defined in the CLI profile mlflow:

export MLFLOW_TRACKING_URI=databricks://mlflow
mlflow run -P alpha=0.1 --experiment-id 49 -m databricks -c cluster-spec.json
=== Fetching project from into /var/folders/kc/l20y4txd5w3_xrdhw6cnz1080000gp/T/tmp6_rk_mme ===
=== Uploading project to DBFS path /dbfs/mlflow-experiments/49/projects-code/db7ec766f11c6d1fcdb7bf64e7429b4a355712e1a14b5039bc06717539334b1b.tar.gz ===
=== Finished uploading project to /dbfs/mlflow-experiments/49/projects-code/db7ec766f11c6d1fcdb7bf64e7429b4a355712e1a14b5039bc06717539334b1b.tar.gz ===
=== Running entry point main of project on Databricks ===
=== Launched MLflow run as Databricks job run with ID 2372743. Getting run status page URL... ===
=== Check the run's status at https://<databricks-instance>#job/11641/run/1 ===

View the Azure Databricks job run

The Azure Databricks job run output at https://<databricks-instance>#job/11641/run/1 is:


View the experiment in the MLflow UI

To view experiment in the MLflow UI, go to https://<databricks-instance>/mlflow/#/experiments/49. The output from running the job is:


Display MLflow run information

To display the MLflow run information details, click the link in the Date column.


You can navigate back to the Azure Databricks job run page by clicking the Logs link in the Job Output field.