When you create a cluster, you can either provide a fixed number of workers for the cluster or provide a minimum and maximum number of workers for the cluster.
When you provide a fixed size cluster, Azure Databricks ensures that your cluster has the specified number of workers. When you provide a range for the number of workers, Databricks chooses the appropriate number of workers required to run your job. This is referred to as autoscaling.
With autoscaling, Azure Databricks dynamically reallocates workers to account for the characteristics of your job. Certain parts of your pipeline may be more computationally demanding than others, and Databricks automatically adds additional workers during these phases of your job (and removes them when they’re no longer needed).
Autoscaling makes it easier to achieve high cluster utilization, because you don’t need to provision the cluster to match a workload. This applies especially to workloads whose requirements change over time (like exploring a dataset during the course of a day), but it can also apply to a one-time shorter workload whose provisioning requirements are unknown. Autoscaling thus offers two advantages:
- Workloads can run faster compared to a constant-sized under-provisioned cluster.
- Autoscaling clusters can reduce overall costs compared to a statically-sized cluster.
Depending on the constant size of the cluster and the workload, autoscaling gives you one or both of these benefits at the same time. The cluster size can go below the minimum number of workers selected when the cloud provider terminates instances. In this case, Azure Databricks continuously retries to re-provision instances in order to maintain the minimum number of workers.
- Autoscaling works best with Databricks Runtime 3.4 and above.
- Autoscaling is not available for spark-submit jobs.
Azure Databricks offers two types of cluster node autoscaling: standard and optimized. For a discussion of the benefits of optimized autoscaling, see the blog post on Optimized Autoscaling.
Automated clusters always use optimized autoscaling. The type of autoscaling performed on interactive clusters depends on the workspace configuration.
Standard autoscaling is used by interactive clusters in workspaces in the Standard pricing tier. Optimized autoscaling is used by interactive clusters in the Premium pricing tier.
Autoscaling behaves differently depending on whether it is optimized or standard and whether applied to an interactive or an automated cluster.
- Scales up from min to max in 2 steps.
- Can scale down even if the cluster is not idle by looking at shuffle file state.
- Scales down based on a percentage of current nodes.
- On automated clusters, scales down if the cluster is underutilized over the last 40 seconds.
- On interactive clusters, scales down if the cluster is underutilized over the last 150 seconds.
To allow Azure Databricks to resize your cluster automatically, you enable autoscaling for the cluster and provide the min and max range of workers.
Interactive cluster - On the Create Cluster page, select the Enable autoscaling checkbox in the Autopilot Options box:
Automated cluster - On the Configure Cluster page, select the Enable autoscaling checkbox in the Autopilot Options box:
Configure the min and max workers.
If you are using an instance pool:
- Make sure the cluster size requested is less than or equal to the minimum number of idle instances in the pool. If it is larger, cluster startup time will be equivalent to a cluster that doesn’t use a pool.
- Make sure the maximum cluster size is less than or equal to the maximum capacity of the pool. If it is larger, the cluster creation will fail.
If you reconfigure a static cluster to be an autoscaling cluster, Azure Databricks immediately resizes the cluster within the minimum and maximum bounds and then starts autoscaling. As an example, the table below demonstrates what happens to clusters with a certain initial size if you reconfigure a cluster to autoscale between 5 and 10 nodes.
|Initial size||Size after reconfiguration|