Libraries

To make third-party or locally-built code available to execution environments running on your clusters, you create a library. Libraries can be written in Python, Java, Scala, and R.

To allow a library to be shared by all users in a Workspace, create the library in the Shared folder. To make it available to a single user, create the library in the user folder.

You can create and manage libraries using the UI, the CLI, and by invoking the Libraries API. This topic focuses on performing library tasks using the UI. For the other methods, see Databricks CLI and Libraries API.

Important

  • Libraries are immutable. They can only be created and deleted.
  • To completely delete a library from a cluster you must restart the cluster.
  • Azure Databricks stores libraries that you upload in the FileStore.
  • After you attach a library to a cluster, to use the library you must reattach any notebooks using the cluster.

Some libraries require lower level configuration and cannot be uploaded using the methods described in this topic. To install these libraries you can write a custom UNIX script that runs at cluster creation time, following the instructions in Cluster Node Initialization Scripts.

Create a library

You can create Java, Scala, and Python libraries to run on Spark clusters, or point to external packages in PyPI, Maven, and CRAN. To create a library:

  1. Right-click the folder where you want to store the library.

  2. Select Create > Library.

    New Library

Upload a Java JAR or Scala JAR

  1. In the Source drop-down list, select Upload Java/Scala JAR.

  2. Enter a library name.

  3. Click and drag your JAR to the JAR File text box.

    Upload Jar

  4. Click Create Library. The library detail screen displays.

  5. In the Attach column, select clusters to attach the library to.

  6. Optionally select the Attach automatically to all clusters. checkbox and click Confirm.

Upload a Python PyPI package or Python Egg

  1. In the Source drop-down list, select Upload Python Egg or PyPI.

    • PyPI package - Enter a PyPI package name and click Install Library. The library detail screen displays.

      Tip

      PyPI has a specific format for installing specific versions of libraries. For example, to install a specific version of pandas use this format for the library: pandas==0.17.1.

    • Python egg:

      1. Enter a library name.
      2. Click and drag the egg and optionally the documentation egg to the Egg File text box.
      3. Click Create Library. The library detail screen displays.
  2. In the Attach column, select clusters to attach the library to.

  3. Optionally select the Attach automatically to all clusters. checkbox and click Confirm.

Upload a Maven package or Spark package

  1. In the Source field, select Maven Coordinate.

    ../_images/maven-library.png
    • In the Coordinate field, enter the Maven coordinate of the library to install. Maven coordinates are in the form groupId:artifactId:version; for example, com.databricks:spark-avro_2.10:1.0.0.
    • If you don’t know the exact coordinate, enter the library name and click Search Spark Packages and Maven Central. A list of matching packages displays. To display details about a package, click its name. You can sort packages by name, organization, and rating. You can also filter the results by writing a query in the search bar. The results refresh automatically.
    1. Select Maven Central or Spark Packages in the drop-down list at the top right.

      ../_images/spark-packages.png
      ../_images/maven-central.png
    2. Optionally select the package version in the Releases column.

    3. Click + Select next to a package. The Coordinate field is filled in with the selected package and version.

  2. Optionally click Advanced Options to set up a custom Maven URL and to exclude certain dependencies.

    • Enter the Repository URL if your coordinate is in a different Maven repository; for example, https://oss.sonatype.org/content/repositories.
    • In the Excludes box, provide the groupId and the artifactId of the dependencies that you want to exclude; for example, log4j:log4j.
  3. Click Create Library. The library detail screen displays.

  4. In the Attach column, select clusters to attach the library to. The dependencies resolve and the library installs in a couple of minutes.

  5. Optionally select the Attach automatically to all clusters. checkbox and click Confirm.

Upload a CRAN library

Note

You can use CRAN libraries on clusters running Azure Databricks Runtime 3.2 and above.

  1. In the Source drop-down list, select R Library.

    ../_images/cran-library.png
  2. In the Install from drop-down list, CRAN-like Repository is the only option and is selected by default. This option covers CRAN and bioconductor repositories.

  3. In the Repository field, enter the CRAN repository URL.

  4. In the Package field, enter the name of the package.

  5. Click Create Library. The library detail screen displays.

  6. In the Attach column, select clusters to attach the library to. When the library is attached to a cluster, the dependencies resolve and the library installs.

  7. Optionally select the Attach automatically to all clusters. checkbox and click Confirm.

Attach a library to a cluster

  1. Go to the folder containing the library.

  2. Click the library name.

  3. In the Attach column, select the cluster to attach the library to.

    ../_images/library-attach.png
  4. To configure the library to be attached to all clusters, optionally select the Attach automatically to all clusters checkbox and click Confirm.

View the libraries attached to a cluster

  1. Click the clusters icon Clusters Icon in the sidebar.
  2. Click the cluster name.
  3. Click the Libraries tab. For each library the tab displays the library name and version, whether the library has been deleted, and the library location.

Move a library

  1. Go to the library location in the Workspace.
  2. Click the drop-down arrow Menu Dropdown to the right of the library name and select Move. A folder browser displays.
  3. Click the destination folder.
  4. Click Select.
  5. Click Confirm and Move.

Delete a library

  1. Go to the library location in the Workspace.
  2. Click the drop-down arrow Menu Dropdown to the right of the library name and select Delete.
  3. Click Confirm and Delete.

Update a library

To update a library, delete the old version of the library and create a new version. The requirements to use the new version of the library are the union of the requirements for deleting a library and uploading a library: you must restart the cluster and reattach any notebooks that use the library to the cluster.