Python Clusters


In anticipation of the upcoming end of life of Python 2, announced for 2020, Python 2 is not supported in Databricks Runtime 6.0 Beta and above. Databricks Runtime 5.5 and lower continue to support Python 2.

Spark jobs, Python notebook cells, and library installation all support both Python 2 (Databricks Runtime 5.5 and below) and 3.

Python 3 is supported on all Databricks Runtime versions.

The default Python version for clusters created using the UI is Python 3. In Databricks Runtime 5.5 and below the default version for clusters created using the REST API is Python 2.

Create a Python cluster

To specify the Python version when you create a cluster, select it from the Python Version drop-down.



When you select a Databricks Runtime that doesn’t support Python 2 (like Databricks 6.0), the clusters creation page hides the Python version selector.

You can create a cluster running a specific version of Python using the API by setting the environment variable PYSPARK_PYTHON to /databricks/python/bin/python or /databricks/python3/bin/python3. For an example, see the REST API example Create a Python 3 cluster.

To validate that the PYSPARK_PYTHON configuration took effect, in a Python notebook (or %python cell) run

import sys

If you specified /databricks/python3/bin/python3, it should print something like:

For Databricks Runtime 6.0 Beta and above:

3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0]

For Databricks Runtime 5.5 and below:

3.5.2 (default, Sep 10 2016, 08:21:44)
[GCC 5.4.0 20160609]


When you run %sh python --version in a notebook, python refers to the Ubuntu system Python version, which is Python 2. Use /databricks/python/bin/python to refer to the version of Python used by Databricks notebooks and Spark: this path is automatically configured to point to the correct Python executable.

Frequently asked questions (FAQ)

Can I use both Python 2 and Python 3 notebooks on the same cluster?
No. The Python version is a cluster-wide setting and is not configurable on a per-notebook basis.
What libraries are pre-installed on Python clusters?
Python 2 and 3 share the same set of installed libraries and library versions with only one exception: simples3 is not available for Python 3, so it is installed only in Python 2. For details on the specific libraries that are pre-installed, see the Databricks Runtime release notes.
Will my existing PyPI libraries work with Python 3?
Yes. Databricks installs the correct version if the library supports both Python 2 and 3. If the library does not support Python 3, then library attachment fails with an error.
Will my existing .egg libraries work with Python 3?

It depends on whether your existing egg library is cross-compatible with both Python 2 and 3. If the library does not support Python 3 then either library attachment will fail or runtime errors will occur.

For a comprehensive guide on porting code to Python 3 and writing code compatible with both Python 2 and 3, see

Can I still install Python libraries using init scripts?
A common use case for Cluster Node Initialization Scripts is to install packages. Use /databricks/python/bin/pip to ensure that Python packages install into Databricks Python virtual environment rather than the system Python environment.