Python Clusters

Spark jobs, Python notebook cells, and library installation all support both Python 2 and 3.

Supported Databricks Runtimes

Python 3 is supported on all Databricks Runtime versions.

Create a Python cluster

To specify the Python version when you create a cluster, select it from the Python Version drop-down.

../../_images/python-select.png

You can create a cluster running a specific version of Python using the API by setting the environment variable PYSPARK_PYTHON to /databricks/python/bin/python or /databricks/python3/bin/python3. For an example, see the REST API example Create a Python 3 cluster.

To validate that the PYSPARK_PYTHON configuration took effect, in a Python notebook (or %python cell) run

import sys
print(sys.version)

If you specified /databricks/python3/bin/python3, it should print something like:

3.5.2 (default, Sep 10 2016, 08:21:44)
[GCC 5.4.0 20160609]

Frequently asked questions (FAQ)

Can I use both Python 2 and Python 3 notebooks on the same cluster?
No. The Python version is a cluster-wide setting and is not configurable on a per-notebook basis.
Why does %sh python --version show Python 2?
In the default shell PATH, python refers to the Ubuntu system Python version, which is Python 2. Use /databricks/python/bin/python to refer to the version of Python used by Databricks notebooks and Spark: this path is automatically configured to point to the correct Python executable.
What libraries are pre-installed on Python clusters?
Python 2 and 3 share the same set of installed libraries and library versions with only one exception: simples3 is not available for Python 3, so it is installed only in Python 2. For details on the specific libraries that are pre-installed, see the Databricks Runtime release notes.
Will my existing PyPi libraries work with Python 3?
Yes. Databricks installs the correct version if the library supports both Python 2 and 3. If the library does not support Python 3, then library attachment fails with an error.
Will my existing .egg libraries work with Python 3?

It depends on whether your existing egg library is cross-compatible with both Python 2 and 3. If the library does not support Python 3 then either library attachment will fail or runtime errors will occur.

For a comprehensive guide on porting code to Python 3 and writing code compatible with both Python 2 and 3, see http://python3porting.com/.

Can I still install Python libraries using init scripts?
A common use case for Cluster Node Initialization Scripts is to install packages. Use /databricks/python/bin/pip to ensure that Python packages install into Databricks Python virtual environment rather than the system Python environment.