Distributed Hyperopt + Automated MLflow Tracking

Hyperopt is a popular open-source hyperparameter tuning library. Hyperopt offers two tuning algorithms: Random Search and the Bayesian method Tree of Parzen Estimators (TPE), which offer improved compute efficiency compared to a brute force approach such as grid search.

Databricks Runtime 5.4 ML and above includes Hyperopt, augmented with an implementation powered by Apache Spark. By using the SparkTrials extension of hyperopt.Trials, you can easily distribute a Hyperopt run without making other changes to your Hyperopt usage. When applying the hyperopt.fmin() function, you pass in the SparkTrials class. SparkTrials can accelerate single-machine tuning by distributing trials to Spark workers.

MLflow is an open source platform for managing the end-to-end machine learning lifecycle. Databricks Runtime 5.4 ML and above support automated MLflow tracking for hyperparameter tuning with Hyperopt and SparkTrials in Python. When automated MLflow tracking is enabled and you run fmin() with SparkTrials, hyperparameters and evaluation metrics are automatically logged in MLflow. Without automated MLflow tracking, you must make explicit API calls to log to MLflow. Automated MLflow tracking is enabled by default. To disable it, set the Spark configuration spark.databricks.mlflow.trackHyperopt.enabled to false. You can still use SparkTrials to distribute tuning even without automated MLflow tracking.


Azure Databricks does not support logging to MLflow from workers, so you cannot add custom logging code in the objective function you pass to Hyperopt.

How to Use Hyperopt with SparkTrials

This section describes how to configure the arguments you pass to Hyperopt, best practices in using Hyperopt, and troubleshooting issues that may arise when using Hyperopt.

fmin() arguments

The fmin() documentation has detailed explanations for all the arguments. We briefly mention the important ones below:

  • fn: The objective function to be called with a value generated from the hyperparameter space (space). fn can return the loss as a scalar value or in a dictionary (refer to Hyperopt docs for details). This is usually where most your code would be, for example, loss calculation, model training, and so on.
  • space: An expression that generates the hyperparameter space Hyperopt searches. A simple example is hp.uniform('x', -10, 10), which defines a single-dimension search space between -10 and 10. Hyperopt provides great flexibility in defining the hyperparameter space. After you are familiar with Hyperopt you can use this argument to make your tuning more efficient.
  • algo: The search algorithm Hyperopt uses to search the hyperparameter space (space). Typical values are hyperopt.random.suggest for Random Search and hyperopt.tpe.suggest for TPE.
  • max_evals: The number of hyperparameter settings to try, that is, the number of models to fit. This number should be large enough to amortize overhead.
  • max_queue_len: The number of hyperparameter settings Hyperopt should generate ahead of time. Since the Hyperopt TPE generation algorithm can take some time, it can be helpful to increase this beyond the default value of 1, but generally no larger than the SparkTrials setting parallelism.

SparkTrials arguments

  • parallelism: The maximum number of concurrent runs allowed. This value cannot be greater than 128 or the number of CPUs on all worker nodes of a cluster combined. Higher concurrency will usually shorten the wall clock time to finding the optimal configuration. However, the total amount of compute needed (or DBUs) is typically more than what is needed if you run tuning serially. The reason is that a single serial tuning run is always able to access the entire prior (previous results), while with parallel runs the optimizer cannot know the outcome of the other concurrent runs still in progress when selecting new hyperparameter values to test.
  • timeout: The maximum number of seconds an fmin() call can take. Once this number is exceeded, all runs are terminated and fmin() would then exit. All information about completed runs is preserved based on which best model is selected. This argument can save you time as well as help you control your cluster cost.

The complete SparkTrials API is included in the Example Notebook. To find it, search for help(SparkTrials).

Best Practices

Here are a few things that help you get the most out of using Hyperopt:

  • Bayesian approaches can be much more efficient than grid search and random search. Hence, with the Hyperopt Tree of Parzen Estimators (TPE) algorithm, it is often possible to explore more hyperparameters and larger ranges. However, if you can use domain knowledge to restrict the search domain which will help to speed up tuning and produce better results.
  • For models with long training times, start experimenting with small datasets and as many hyperparameters as possible. Use MLflow to introspect the best performing models, make informed decisions about how to fix as many hyperparameters as you can, and intelligently down-scope the parameter space as you prepare for tuning at scale.
  • Take advantage of Hyperopt support for conditional dimensions and hyperparameters. For example, when you evaluate multiple flavors of gradient descent, instead of limiting the hyperparameter space to just the common hyperparameters, you can have Hyperopt include conditional hyperparameters – the ones that are only appropriate for a subset of the flavors.


  • If the loss is NaN (not a number), it is usually because the objective function passed to fmin() returns NaN. A NaN loss does not affect other runs and you can safely ignore it. If you want to avoid NaN losses, you can either adjust the hyperparameter space or modify your objective function.
  • With Hyperopt search methods the loss usually does not decrease monotonically with each run. However, you can often find the best hyperparameters more quickly than using other methods.
  • Both Hyperopt and Spark incur certain overheads. For short trial runs (low tens of seconds), these overheads dominate and the speedup could be pretty small or even zero.
  • When you use hp.choice, Hyperopt returns only the index of the choice list. Therefore the parameter logged in MLflow is also the index. You can use hyperopt.space_eval to retrieve the parameter values.

Example Notebook

Here is a notebook that shows distributed Hyperopt + automated MLflow tracking in action. Before diving into the notebook, make sure you:

  • Install mlflow from PyPI on your cluster
  • Call fmin() inside with mlflow.start_run():

After you perform the actions in the last cell in the notebook, your MLflow UI should display: