Serverless Pools

Note

  • This feature is in Beta.
  • Serverless pools work only for SQL, Python, and R.

A serverless pool is self-managed pool of cloud resources that is auto-configured for interactive Spark workloads. You provide the minimum and maximum number of workers and the worker type, and Azure Databricks provisions the compute and local storage based on your usage.

The key benefits of serverless pools are:

  • Auto-Configuration: Optimizes the Spark configuration to get the best performance for SQL and machine learning workloads in a shared environment. Also chooses the best cluster parameters to save cost on infrastructure.

  • Elasticity: Automatically scales compute resources and local storage independently based on usage. See Autoscaling and Autoscaling Local Storage.

  • Fine-grained sharing: Provides Spark-native fine-grained sharing for maximum resource utilization and minimum query latencies.

    • Preemption: Proactively preempts Spark tasks from over-committed users to ensure all users get their fair share of cluster time and their jobs complete in a timely manner even when contending with dozens of other users. This uses Spark Task Preemption for High Concurrency.
    • Fault isolation: Creates an environment for each notebook, effectively isolating them from one another.

To create a serverless pool, select the Serverless Pool cluster type when you create a cluster.

../../_images/serverless-azure.png

Serverless pools versus standard clusters

To choose between serverless pools and standard clusters, consider which of the following describes your environment and workload requirements.

Serverless pools
  • Use SQL, Python, or R.
  • Want Azure Databricks to manage worker selection.
Standard cluster
  • Use Scala.
  • Require a specific Spark version or want to configure Spark.
  • Want to control some advanced parameters like SSH public key, logging, and so on.