Enabling Azure AD Credential Passthrough to Azure Data Lake Storage Gen1 (Preview)

You can authenticate automatically to Azure Data Lake Storage Gen1 (ADLS) from Azure Databricks clusters using the same Azure Active Directory (Azure AD) identity that you use to log into Azure Databricks.

When you enable your cluster for Azure AD credential passthrough, commands that you run on that cluster will be able to read and write your data in Azure Data Lake Storage Gen1 without requiring you to configure service principal credentials for access to storage.

Note

Azure Data Lake Storage credential passthrough:

This topic includes the tasks that an Azure administrator must complete to enable Azure Databricks users to take advantage of Azure Data Lake Storage credential passthrough.

For information about enabling clusters for Azure Data Lake Storage credential passthrough and reading and writing data in Azure Data Lake Storage Gen1 when credential passthrough is enabled, see Access Azure Data Lake Storage Gen1 automatically with your Azure Active Directory credentials (Preview).

Configure the lifetime of your Azure Active Directory tokens

By default, the Azure Active Directory token that is passed to your cluster lasts for at most one hour. If you run a cell that takes longer than one hour, your token will stop being valid in the middle of the cell run, and any remaining reads or writes to Azure Data Lake Storage Gen1 will fail.

To avoid this issue, increase the value of AccessTokenLifetime in your tenant’s token policy. You must be an Azure Active Directory admin to make this configuration. Access token lifetimes can be as short as 10 minutes and as long as one day. We recommend that you use a lifetime of at least 4 hours. If you know the runtime of your longest-running cells, you should configure your token lifetime to be longer than that.

To set the access token lifetime using PowerShell:

  1. Verify that you have an Azure Active Directory Service Principal for your Azure Databricks account.

    If you do not already have service credentials, you can follow the instructions in Create service principal with portal. If you do not know your-directory-id (also referred to as tenant ID in Azure Active Directory), you can follow the instructions in Get tenant ID.

  2. Download the latest Azure Active Directory PowerShell release.

  3. Launch PowerShell.

  4. Sign into your Azure Active Directory admin account:

    Connect-AzureAD -Confirm
    
  5. Deploy a new token lifetime policy with an access token lifetime of 4 hours:

    New-AzureADPolicy -Definition @('{"TokenLifetimePolicy":{"Version":1, "AccessTokenLifetime":"00.04:00:00"}}') -DisplayName "DatabricksPassthroughPolicyScenario" -IsOrganizationDefault $false -Type "TokenLifetimePolicy"
    
  6. Find the ID of the Azure Active Directory Service Principal for your Azure Databricks account:

    Get-AzureRmADServicePrincipal -SearchString "Databricks"
    
  7. Apply the token lifetime policy to your Azure Databricks Service Principal:

    Add-AzureADServicePrincipalPolicy -Id <Databricks Service Principal ID> -RefObjectId <Token Lifetime Policy ID>
    

For more information about Azure Active Directory token lifetimes, see the Azure documentation.

Set permissions for your data

When users run notebooks that access Azure Data Lake Storage Gen1 using credential passthrough, all of the data they access must be stored entirely in Azure Data Lake Storage Gen1. Credential passthrough does not support filesystems other than Azure Data Lake Storage Gen1.

Make sure that user permissions are set correctly for their data. The Azure Active Directory user who logs into Azure Databricks should be able to read (and, if necessary, write) their Azure Data Lake Storage Gen1 data.

Security

It is safe to share Azure Data Lake Storage credential passthrough clusters with other users. Users will be isolated from each other and will not be able to read or use each other’s credentials.