Cluster Nodes

A cluster consists of one driver node and worker nodes. You can pick separate cloud provider instance types for the driver and worker nodes, although by default the driver node uses the same instance type as the worker node. Different families of instance types fit different use cases, such as memory-intensive or compute-intensive workloads.

Azure Databricks maps cluster node instance types to compute units known as DBUs. See the instance types pricing page for a list of the supported instance types and their corresponding DBUs. For cloud provider information, see Azure instance type specifications and pricing.

Azure Databricks will always provide one year’s deprecation notice before ceasing support for an Azure instance type.

Driver node

The driver maintains state information of all notebooks attached to the cluster. The driver node is also responsible for maintaining the SparkContext and interpreting all the commands you run from a notebook or a library on the cluster. The driver node also runs the Spark master that coordinates with the Spark executors.

The default value of the driver node type is the same as the worker node type. You can choose a larger driver node type with more memory if you are planning to collect() a lot of data from Spark workers and analyze them in the notebook.

Tip

Since the driver node maintains all of the state information of the notebooks attached, make sure to detach unused notebooks from the driver.

Worker node

Azure Databricks workers run the Spark executors and other services required for the proper functioning of the clusters. When you distribute your workload with Spark, all of the distributed processing happens on workers.

Tip

To run a Spark job, you need at least one worker. If a cluster has zero workers, you can run non-Spark commands on the driver but Spark commands will fail.