MLlib + Automated MLflow Tracking

MLflow is an open source platform for managing the end-to-end machine learning lifecycle. Both Databricks Runtime 5.3 and Databricks Runtime 5.3 ML and above support automated MLflow Tracking for Apache Spark MLlib model tuning in Python. Databricks Runtime 5.5 ML includes MLflow so you do not need to install it separately.

When automated MLflow tracking from MLlib is enabled, and you run tuning code that uses CrossValidator or TrainValidationSplit, hyperparameters and evaluation metrics are automatically logged in MLflow. Without automated MLflow tracking, you must make explicit API calls to log to MLflow.

Automated MLflow tracking is enabled by default for both Databricks Runtime 5.4 and Databricks Runtime 5.4 ML and above. To enable automated MLflow tracking for runtime versions lower than 5.4, set the Spark configuration spark.databricks.mlflow.trackMLlib.enabled to true.

Manage MLflow runs

CrossValidator or TrainValidationSplit log tuning results as nested MLflow runs:

  • Main or parent run: The info for CrossValidator or TrainValidationSplit is logged as the “main” run. If there is an active run already, it logs under this active run and does not end the run. If there is no active run, then it creates a new run, logs under it, and ends the run before returning.
  • Child runs: Each hyperparameter setting tested and its evaluation metric is logged as a child run under the main run.

When calling fit(), we recommend active MLflow run management; that is, wrap the call to fit() inside a “with mlflow.start_run():” statement. This ensures that the info is logged under its own MLflow “main” run, and it makes it easier to log extra tags, params, or metrics to that run.

Note

When fit() is called multiple times within the same active MLflow run, it logs those multiple runs to the same “main” run. To resolve name conflicts for MLflow params and tags, names with conflicts are mangled by appending a UUID.

Example notebook

Here is a notebook that demonstrates automated MLflow tracking in action.

After you perform the actions in the last cell in the notebook, your MLflow UI should display:

../../../_images/mllib-mlflow-demo.png