It can often be difficult to estimate how much disk space a particular job will take. To save you from having to estimate how many gigabytes of managed disk to attach to your cluster at creation time, Azure Databricks automatically enables autoscaling local storage on all Azure Databricks clusters.
With autoscaling local storage, Azure Databricks monitors the amount of free disk space available on your cluster’s Spark workers. If a worker begins to run too low on disk, Databricks automatically attaches a new managed disk to the worker before it runs out of disk space. Disks are attached up to a limit of 5 TB of total disk space per virtual machine (including the virtual machine’s initial local storage).
The managed disks attached to a virtual machine are detached only when the virtual machine is returned to Azure. That is, managed disks are never detached from a virtual machine as long as it is part of a running cluster. To scale down managed disk usage, Azure Databricks recommends using this feature in a cluster configured with Cluster Size and Autoscaling or Automatic termination.