Deploying Azure Databricks in your Azure Virtual Network (Preview)

The default deployment of Azure Databricks is a fully managed service on Azure: all data plane resources, including a virtual network that all clusters will be associated with, are deployed to a locked resource group. If you require network customization, however, you can deploy Azure Databricks in your own Azure Virtual Network (VNet), enabling you to:

  • Connect Azure Databricks to other Azure services (such as Azure Storage) in a secure manner using service endpoints.
  • Connect to on-premises data sources for use with Azure Databricks, taking advantage of user-defined routes.
  • Connect Azure Databricks to a network virtual appliance to inspect all outbound traffic and take actions according to allow and deny rules.
  • Configure Azure Databricks to use custom DNS.
  • Configure network security group (NSG) rules to specify egress traffic restrictions.
  • Deploy Azure Databricks clusters in your existing Azure Virtual Network.

Deploying Azure Databricks to your own VNet also lets you take advantage of flexible CIDR ranges (anywhere between /16-/24 for the virtual network, and between /18-/26 for the subnets).

Note

The ability to deploy Azure Databricks to your own VNet is a preview feature and requires enrollment. To enroll, the Azure Databricks team must whitelist your subscription. To get your subscription whitelisted, reach out to your assigned Microsoft Cloud Solution Architect (CSA) or Data Solution Architect (DSA) and provide them with the following details:

  • Subscription ID
  • Region
  • Description of why you want to deploy Azure Databricks to your own VNet and how you plan to use the feature

Once the subscription is whitelisted, you will be provided with detailed instructions and assistance.

Warning

A workspace with a smaller virtual network–that is, a lower CIDR range–can run out of IP addresses (network space) more quickly than those with a larger virtual network. For example, a workspace with a /24 VNet and /26 subnets can have a maximum of 64 nodes active at a time, whereas a workspace with a /20 VNet and /22 subnets can house a maximum of 1024 nodes.

You cannot replace the VNet for an existing workspace. If your current workspace cannot accommodate the required number of active cluster nodes, we recommend that you create another workspace in a larger VNet. Follow these detailed migration steps to copy resources (notebooks, cluster configurations, jobs) from the old to new workspace.

Workflow

To deploy Azure Databricks in your own Azure Virtual Network, you do the following:

  1. Prepare a Azure Network Security Group (NSG) with necessary ingress and egress rules for communication with Azure Databricks control plane and relevant data services.

  2. Prepare or update a virtual network to which you would like to deploy an Azure Databricks workspace.

    Two new subnets, dedicated to Azure Databricks, are required in the virtual network:

    • A private subnet with configured network security group that allows cluster-internal communication.
    • A public subnet with configured network security group that allows communication with the Azure Databricks control plane.

    A CIDR block between /16 - /24 is required for the virtual network, and a CIDR block between /18 - /26 is required for the private and public subnets.

  1. Create an Azure Databricks workspace in the configured virtual network.

You will be provided with ARM templates for use in configuring your Azure NSG & Azure Virtual Network, and creating your Azure Databricks workspace.