Libraries

To make third-party or locally-built code available to execution environments running on your clusters, you create a library. Libraries can be written in Python, Java, Scala, and R.

To allow a library to be shared by all users in a Workspace, create the library in the Shared folder. To make it available to a single user, create the library in the user folder.

You can create and manage libraries using the UI, the CLI, and by invoking the Libraries API. This topic focuses on performing library tasks using the UI. For the other methods, see Databricks CLI and Libraries API.

Some libraries require lower level configuration and cannot be uploaded using the methods described in this topic. To install these libraries you can write a custom UNIX script that runs at cluster creation time, following the instructions in Cluster Node Initialization Scripts.

Library lifecycle

Libraries can be created, attached to a cluster, detached from a cluster, and deleted.

When you create a library, you either upload or install the library package. Packages that you upload or install using Maven are stored in the FileStore in FileStore/jars. Azure Databricks installs Python packages in the Spark container using pip install.

To use a library, you first attach it to a cluster. To use the library in a notebook that was attached to the cluster before the library was attached, you must reattach the cluster to the notebook.

There are two steps to permanently delete a library:

  1. Move the library to the Trash folder.
  2. Either permanently delete the library in the Trash folder or empty the Trash folder.

When you move a library to the Trash folder, the library is not marked for deletion, which means that it remains available on any clusters that it is attached to. When you permanently delete a library, the cluster to which the library is attached identifies the library as marked for deletion. The following screenshot illustrates the detach and delete indications:

../_images/library-delete-detach.png

As indicated in the screenshot, when you detach a library from a cluster or permanently delete a library previously attached to the cluster, you must restart the cluster.

Create a library

You can create Java, Scala, and Python libraries to run on Spark clusters, or point to external packages in PyPI, Maven, and CRAN. To create a library:

  1. Right-click the folder where you want to store the library.

  2. Select Create > Library.

    New Library

Upload a Java JAR or Scala JAR

  1. In the Source drop-down list, select Upload Java/Scala JAR.

  2. Enter a library name.

  3. Click and drag your JAR to the JAR File text box.

    Upload Jar

  4. Click Create Library. The library detail screen displays.

  5. In the Attach column, select clusters to attach the library to.

  6. Optionally select the Attach automatically to all clusters. checkbox and click Confirm.

Upload a Python PyPI package or Python Egg

  1. In the Source drop-down list, select Upload Python Egg or PyPI.

    • PyPI package - Enter a PyPI package name and click Install Library. The library detail screen displays.

      Tip

      PyPI has a specific format for installing specific versions of libraries. For example, to install a specific version of pandas use this format for the library: pandas==0.17.1.

    • Python egg:

      1. Enter a library name.
      2. Click and drag the egg and optionally the documentation egg to the Egg File text box.
      3. Click Create Library. The library detail screen displays.
  2. In the Attach column, select clusters to attach the library to.

  3. Optionally select the Attach automatically to all clusters. checkbox and click Confirm.

Upload a Maven package or Spark package

  1. In the Source field, select Maven Coordinate.

    ../_images/maven-library.png
    • In the Coordinate field, enter the Maven coordinate of the library to install. Maven coordinates are in the form groupId:artifactId:version; for example, com.databricks:spark-avro_2.10:1.0.0.
    • If you don’t know the exact coordinate, enter the library name and click Search Spark Packages and Maven Central. A list of matching packages displays. To display details about a package, click its name. You can sort packages by name, organization, and rating. You can also filter the results by writing a query in the search bar. The results refresh automatically.
    1. Select Maven Central or Spark Packages in the drop-down list at the top right.

      ../_images/spark-packages.png
      ../_images/maven-central.png
    2. Optionally select the package version in the Releases column.

    3. Click + Select next to a package. The Coordinate field is filled in with the selected package and version.

  2. Optionally click Advanced Options to set up a custom Maven URL and to exclude certain dependencies.

    • Enter the Repository URL if your coordinate is in a different Maven repository; for example, https://oss.sonatype.org/content/repositories.
    • In the Excludes box, provide the groupId and the artifactId of the dependencies that you want to exclude; for example, log4j:log4j.
  3. Click Create Library. The library detail screen displays.

  4. In the Attach column, select clusters to attach the library to. The dependencies resolve and the library installs in a couple of minutes.

  5. Optionally select the Attach automatically to all clusters. checkbox and click Confirm.

Upload a CRAN library

Note

You can use CRAN libraries on clusters running Azure Databricks Runtime 3.2 and above.

  1. In the Source drop-down list, select R Library.

    ../_images/cran-library.png
  2. In the Install from drop-down list, CRAN-like Repository is the only option and is selected by default. This option covers CRAN and bioconductor repositories.

  3. In the Repository field, enter the CRAN repository URL.

  4. In the Package field, enter the name of the package.

  5. Click Create Library. The library detail screen displays.

  6. In the Attach column, select clusters to attach the library to. When the library is attached to a cluster, the dependencies resolve and the library installs.

  7. Optionally select the Attach automatically to all clusters. checkbox and click Confirm.

View library details

  1. Go to the folder containing the library.
  2. Click the library name.

The library details page shows the running clusters and whether the library is attached to the clusters. If the library is installed, the page contains a link to the package host. If the library is uploaded, the page displays a link to the uploaded package file.

Attach a library to a cluster

  1. Go to the folder containing the library.

  2. Click the library name.

  3. In the Attach column, select the cluster to attach the library to.

    ../_images/library-attach.png
  4. To configure the library to be attached to all clusters, optionally select the Attach automatically to all clusters checkbox and click Confirm.

Detach a library from a cluster

  1. Go to the folder containing the library.

  2. Click the library name.

  3. In the Attach column, deselect the cluster the library is attached to.

    ../_images/library-attach.png

View the libraries attached to a cluster

  1. Click the clusters icon Clusters Icon in the sidebar.
  2. Click the cluster name.
  3. Click the Libraries tab. For each library, the tab displays the library name and version, whether the library has been deleted, and the library location.

Move a library

  1. Go to the library location in the Workspace.
  2. Click the drop-down arrow Menu Dropdown to the right of the library name and select Move. A folder browser displays.
  3. Click the destination folder.
  4. Click Select.
  5. Click Confirm and Move.

Delete a library

You can move a library to the Trash folder and permanently delete the library. For details, see Delete an object.

You can also move a library to the Trash folder by clicking Move to Trash on the library details page.

Note

When you move a library to the Trash folder, the library is not marked for deletion, which means that it remains available on any clusters that it is attached to. You must permanent delete the library from the Trash folder or empty the Trash folder to make it unavailable.

Update a library

To update a library, delete the old version of the library and create a new version. The requirements to use the new version of the library are the union of the requirements for deleting a library and uploading a library: you must restart the cluster and reattach any notebooks that use the library to the cluster.