Databricks Runtime for Machine Learning (Databricks Runtime ML) provides a ready-to-go environment for machine learning and data science. It contains multiple popular libraries, including TensorFlow, PyTorch, Keras, and XGBoost. It also supports distributed training using Horovod.
Databricks Runtime ML lets you start an Azure Databricks cluster with all of the libraries required for distributed training. It ensures the compatibility of the libraries included on the cluster (between TensorFlow and CUDA / cuDNN, for example) and substantially speeds up cluster start-up.
Databricks Runtime ML includes high-performance distributed machine learning packages that use MPI (Message Passing Interface) and other low-level communication protocols. Because these protocols do not natively support encryption over the wire, these ML packages can potentially send unencrypted sensitive data across the network.
What are the risks?
Messages sent across the network by these ML packages are typically either ML model parameters or summary statistics about training data. It is therefore not typically expected that sensitive data, such as protected health information, would be sent over the wire in an unencrypted fashion. However, it is possible that certain configurations or uses of these packages (such as specific model designs) could result in messages being sent across the network that contain such information.
Which libraries are affected?
Library utilities are not available in Databricks Runtime 5.5 ML and below.
In this topic:
Databricks Runtime ML is built on Databricks Runtime. For example, Databricks Runtime 5.0 ML is built on Databricks Runtime 5.0. The libraries included in the base Databricks Runtime are listed in the Databricks Runtime Release Notes.
The Databricks Runtime ML includes a variety of popular ML libraries. The libraries are updated periodically to include new features and fixes.
Azure Databricks has designated a subset of the supported libraries as top-tier libraries. For these libraries, Azure Databricks provides a faster update cadence, updating to the latest upstream package releases with each runtime release (barring dependency conflicts). Azure Databricks also provides advanced support, testing, and embedded optimizations for top-tier libraries.
For a full list of top-tier and other provided libraries, see the following topics for each available runtime:
When you create a cluster, select a Databricks Runtime ML version from the Databricks Runtime Version drop-down. Both CPU and GPU-enabled ML runtimes are available.
If you select a GPU-enabled ML runtime, you are prompted to select a compatible Driver Type and Worker Type. Incompatible instance types are grayed out in the drop-downs. GPU-enabled instance types are listed under the GPU-Accelerated label.
Libraries in your workspace that automatically install into all clusters can conflict with the libraries included in Databricks Runtime ML. Before you create a cluster with Databricks Runtime ML, clear the Install automatically on all clusters checkbox for conflicting libraries.
By using this version of Databricks Runtime, you agree to the terms and conditions outlined in the NVIDIA End User License Agreement (EULA) with respect to the CUDA, cuDNN, and Tesla libraries, and the NVIDIA End User License Agreement (with NCCL Supplement) for the NCCL library.