How to Correctly Update a Maven Library in Azure Databricks

Problem

Let’s say you make a minor update to a library in the repository, and you don’t want to change the version number because it is just a small change for testing purposes. However, when you attach the library to your cluster again, your code changes are not included in the library.

Cause

One strength of Azure Databricks is the ability to install third-party or custom libraries, such as from a Maven repository. However, when a library is updated in the repository, there is no automated way to update the corresponding library in the cluster.

When you request Azure Databricks to download a library in order to attach it to a cluster, the following process occurs:

  1. In Azure Databricks, you request a library from a Maven repository.
  2. Azure Databricks checks the local cache for the library, and if it is not present, downloads the library from the Maven repository to a local cache.
  3. Azure Databricks then copies the library to DBFS (/FileStore/jars/maven/).
  4. Upon subsequent requests for the library, Azure Databricks uses the file that has already been copied to DBFS, and does not download a new copy.

Solution

To ensure that an updated version of a library (or a library that you have customized) is downloaded to a cluster, make sure to increment the build number or version number of the artifact in some way. For example, you can change libA_v1.0.0-SNAPSHOT to libA_v1.0.1-SNAPSHOT, and then the new library will download. You can then attach it to your cluster.