Conda for Python Package Management

In Databricks Runtime 5.0 ML Beta, the Conda package manager is used to install Python packages. All Python packages are installed inside a single environment. This environment is /databricks/python2 on clusters using Python 2 or /databricks/python3 on clusters using Python 3. Switching (or activating) Conda environments is not supported.

Install Python packages on the driver node

You can call the conda command inside a notebook to install a Python package on the driver (master) node of a cluster running Databricks Runtime ML. For some libraries you may need to detach and attach your notebook again before you can import a newly installed Python module.

%sh /databricks/conda/bin/conda install -p /databricks/python -c conda-forge theano -y


Python packages installed using the conda command inside notebooks are available only on the driver node and not on the worker nodes. You can install a package on all workers using a library or an init script.

Install Python packages on all cluster nodes

The easiest way to use Conda to install a package on all cluster nodes is to call conda inside an init script.

/databricks/conda/bin/conda install -p /databricks/python -c conda-forge theano -y
exit 0

Alternatively you can switch to the default environment and install multiple packages.

set -e
/databricks/python/bin/python -V
. /databricks/conda/etc/profile.d/
conda activate /databricks/python
conda install -c conda-forge theano -y
exit 0