Enabling Azure Data Lake Storage Credential Passthrough

Azure Databricks supports a type of cluster configuration, called Azure Data Lake Storage credential passthrough, that allows users to authenticate to Azure Data Lake Storage from Azure Databricks clusters using the same Azure Active Directory identity that they use to log into Azure Databricks. When a cluster is enabled for Azure Data Lake Storage credential passthrough, commands run on that cluster can read and write data in Azure Data Lake Storage without requiring users to configure service principal credentials to access the storage. The credentials are set automatically from the user initiating the action.

This topic includes the tasks that an Azure Active Directory administrator must complete to enable Azure Data Lake Storage credential passthrough.

For information about enabling clusters for Azure Data Lake Storage credential passthrough and for reading and writing data in Azure Data Lake Storage, see Authenticate to Azure Data Lake Storage with your Azure Active Directory Credentials.

Configure the lifetime of Azure Active Directory tokens

By default, the Azure Active Directory token that is passed to a cluster lasts for at most one hour. If users run a cell that takes longer than one hour, the token will stop being valid in the middle of the cell run, and any remaining reads or writes to Azure Data Lake Storage will fail.

To avoid this issue, increase the value of AccessTokenLifetime in your tenant’s token policy. Access token lifetimes can be as short as 10 minutes and as long as one day. We recommend that you use a lifetime of at least 4 hours. If you know the runtime of the longest-running cells, you should configure the token lifetime to be longer than that.

To set the access token lifetime using PowerShell:

  1. Verify that you have an Azure Active Directory service principal for your Azure Databricks account.

    If you do not already have service credentials, you can follow the instructions in Create service principal with portal. If you do not know directory-id (also referred to as tenant ID in Azure Active Directory), you can follow the instructions in Get tenant ID.

  2. Download the latest Azure Active Directory PowerShell release.

  3. Launch PowerShell.

  4. Sign into your Azure Active Directory admin account:

    Connect-AzureAD -Confirm
    
  5. Deploy a new token lifetime policy with an access token lifetime of 4 hours:

    New-AzureADPolicy -Definition @('{"TokenLifetimePolicy":{"Version":1, "AccessTokenLifetime":"00.04:00:00"}}') -DisplayName "DatabricksPassthroughPolicyScenario" -IsOrganizationDefault $false -Type "TokenLifetimePolicy"
    
  6. Find the ID of the Azure Active Directory service principal for your Azure Databricks account:

    Get-AzureRmADServicePrincipal -SearchString "Databricks"
    
  7. Apply the token lifetime policy to your Azure Databricks service principal:

    Add-AzureADServicePrincipalPolicy -Id <Azure Databricks service principal ID> -RefObjectId <Token Lifetime Policy ID>
    

For more information about Azure Active Directory token lifetimes, see the Azure documentation.

Set permissions for data

When users run notebooks that access Azure Data Lake Storage using Azure Data Lake Storage credential passthrough, all of the data they access must be stored entirely in Azure Data Lake Storage. Azure Data Lake Storage credential passthrough does not support filesystems other than Azure Data Lake Storage.

Make sure that user permissions are set correctly for their data. The Azure Active Directory user who logs into Azure Databricks should be able to read (and, if necessary, write) their Azure Data Lake Storage data.

Security

It is safe to share Azure Data Lake Storage credential passthrough clusters with other users. Users are isolated from each other and are not be able to read or use each other’s credentials.