Databricks Runtime 4.1 ML (Beta)

Databricks Runtime 4.1 ML provides a ready-to-go environment for machine learning and data science. It contains multiple popular libraries, including TensorFlow, Keras, and XGBoost. It also supports distributed TensorFlow training using Horovod.

Note

Databricks Runtime 4.1 ML will be deprecated soon. We recommend you use Databricks Runtime 5.0 ML.

For more information, including instructions for creating a Databricks Runtime ML cluster, see Databricks Runtime for Machine Learning.

Libraries

Databricks Runtime 5.0 ML is built on top of Databricks Runtime 5.0. For information on what’s new in Databricks Runtime 5.0, see the Databricks Runtime 4.1 release notes. In addition to the new features in Databricks Runtime 4.1, Databricks Runtime 4.1 ML includes the following libraries to support machine learning. Some of these are also included in the base Databricks Runtime 4.1 and are noted as such.

Category Libraries
Distributed Deep Learning

Distributed training with Horovod and Spark:

  • HorovodEstimator
  • horovod 0.12.1
  • openmpi 3.0.0
  • paramiko 2.4.1
  • cloudpickle 0.5.2

Distributed TensorFlow and Keras prediction:

  • spark-deep-learning 1.0 pre-release
  • tensorframes 0.3.0
Deep Learning

Keras:

  • keras 2.1.5
  • h5py 2.7.1

TensorFlow:

  • (CPU clusters) tensorflow 1.7.1
  • (GPU clusters) tensorflow-gpu 1.7.1

GPU libraries:

  • CUDA 9.0 (also installed in base Databricks Runtime)
  • cuDNN 7.0 (also installed in base Databricks Runtime)
  • NCCL 2.0.5-3
XGBoost
Other machine learning libraries
  • numpy 1.14.2 (also installed in base Databricks Runtime; version may differ)
  • scikit-learn 0.18.1 (also installed in base Databricks Runtime)
  • scipy (also installed in base Databricks Runtime)

Maintenance Updates

Maintenance updates made to Databricks Runtime 4.1 ML since its initial release include:

  • July 31, 2018
    • Added Azure SQL DW connector to ML Runtime 4.1
    • Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
    • Fixed a bug affecting Spark SQL execution engine.
    • Fixed a bug affecting code generation.
    • Fixed a bug (java.lang.NoClassDefFoundError) affecting Databricks Delta.
    • Improved error handling in Databricks Delta.
    • Fixed a bug that caused incorrect data skipping statistics to be collected for string columns 32 characters or greater.