API Examples

This topic contains a range of examples that demonstrate how to use the Azure Databricks API.

Requirements

Before beginning with these examples it is important to review the Authentication topic. When using cURL we assume that you store Azure Databricks API credentials under .netrc or use BEARER authentication. In the following examples, replace <your-token> with your Databricks personal access token.

In the following examples, replace <databricks-instance> with the <REGION>.azuredatabricks.net domain name of your Azure Databricks deployment.

Use jq to parse API output

Sometimes it can be useful to parse out parts of the JSON output. In these cases, we recommend you to use the utility jq. For more information, see the jq Manual. You can install jq on MacOS using Homebrew with brew install jq.

Invoke a GET

While most API calls require that you specify a JSON body, for GET calls you can simply specify a query string. For example, to get the details for a cluster, run:

curl -n -https://<databricks-instance>/api/2.0/clusters/get?cluster_id=<cluster-id>

Upload a big file into DBFS

The amount of data uploaded by single API call cannot exceed 1MB. To upload a file that is larger than 1MB to DBFS, we have to use the streaming API, which is a combination of create, addBlock, and close.

Here is a example of how to perform this action using Python.

import json
import base64
import requests

DOMAIN = '<databricks-instance>'
TOKEN = '<your-token>'
BASE_URL = 'https://%s/api/2.0/dbfs/' % (DOMAIN)

def dbfs_rpc(action, body):
    """ A helper function to make the DBFS API request, request/response is encoded/decoded as JSON """
    response = requests.post(
        BASE_URL + action,
        headers={"Authorization": "Basic " + base64.standard_b64encode("token:" + TOKEN)},
        json=body
    )
    return response.json()

# Create a handle that will be used to add blocks
handle = dbfs_rpc("create", {"path": "/temp/upload_large_file", "overwrite": "true"})['handle']
with open('/a/local/file') as f:
    while True:
        # A block can be at most 1MB
        block = f.read(1 << 20)
        if not block:
            break
        data = base64.standard_b64encode(block)
        dbfs_rpc("add-block", {"handle": handle, "data": data})
# close the handle to finish uploading
dbfs_rpc("close", {"handle": handle})

Create a Python 3 cluster

The following example shows how to launch a Python 3 cluster using the Databricks REST API and the popular requests Python HTTP library:

import requests

DOMAIN = '<databricks-instance>'
TOKEN = '<your-token>'

response = requests.post(
  'https://%s/api/2.0/clusters/create' % (DOMAIN),
  headers={'Authorization': "Basic " + base64.standard_b64encode("token:" + TOKEN)},
  json={
  "new_cluster": {
    "spark_version": "4.0.x-scala2.11",
    "node_type_id": "Standard_D3_v2",
    'spark_env_vars': {
      'PYSPARK_PYTHON': '/databricks/python3/bin/python3',
    }
  }
)

if response.status_code == 200:
  print(response.json()['cluster_id'])
else:
  print("Error launching cluster: %s: %s" % (response.json()["error_code"], response.json()["message"]))

Jobs API examples

This section shows how to create Python, spark submit, and JAR jobs and run the JAR job and view its output.

Create a Python job

This example shows how to create a Python job. It uses the Apache Spark Python Spark Pi estimation.

  1. Download the Python file containing the example and upload to your Azure Databricks instance using Databricks File System - DBFS.

    dbfs cp pi.py dbfs:/docs/pi.py
    
  2. Create the job.

curl -n -H "Content-Type: application/json" -X POST -d @- https://<databricks-instance>/api/2.0/jobs/create <<JSON
{
  "name": "SparkPi Python job",
  "new_cluster": {
    "spark_version": "4.0.x-scala2.11",
    "node_type_id": "Standard_D3_v2",
    "num_workers": 2
  },
  "spark_python_task": {
    "python_file": "dbfs:/docs/pi.py",
    "parameters": [
      "10"
    ]
  }
}
JSON

Create a spark-submit job

This example shows how to create a spark-submit job. It uses the Apache Spark SparkPi example.

  1. Download the JAR containing the example and upload the JAR to your Azure Databricks instance using Databricks File System - DBFS.

    dbfs cp SparkPi-assembly-0.1.jar dbfs:/docs/sparkpi.jar
    
  2. Create the job.

curl -n \
-X POST -H 'Content-Type: application/json' \
-d '{
      "name": "SparkPi spark-submit job",
      "new_cluster": {
        "spark_version": "4.0.x-scala2.11",
        "node_type_id": "Standard_DS3_v2",
        "num_workers": 2
        },
     "spark_submit_task": {
        "parameters": [
          "--class",
          "org.apache.spark.examples.SparkPi",
          "dbfs:/docs/sparkpi.jar",
          "10"
       ]
     }
}' https://<databricks-instance>/api/2.0/jobs/create

Create and run a JAR job

This example shows how to create and run a JAR job. It uses the Apache Spark SparkPi example.

  1. Download the JAR containing the example.

  2. Upload the JAR to your Azure Databricks instance using the API:

    curl -n \
    -F filedata=@"SparkPi-assembly-0.1.jar" \
    -F path="/docs/sparkpi.jar" \
    -F overwrite=true \
    https://<databricks-instance>/api/2.0/dbfs/put
    

    A successful call returns {}. Otherwise you will see an error message.

  3. Get a list of all Spark versions prior to creating your job.

    curl -n https://<databricks-instance>/api/2.0/clusters/spark-versions
    

    I’m going to use version 4.0.x-scala2.11. See Databricks Runtime Versions for more information about Spark cluster versions.

  4. Create the job. The JAR is specified as a library and the main class name is referenced in the Spark JAR task.

    curl -n \
    -X POST -H 'Content-Type: application/json' \
    -d '{
          "name": "SparkPi JAR job",
          "new_cluster": {
            "spark_version": "4.0.x-scala2.11",
            "node_type_id": "Standard_DS3_v2",
            "num_workers": 2
            },
         "libraries": [{"jar": "dbfs:/docs/sparkpi.jar"}],
         "spark_jar_task": {
            "main_class_name":"org.apache.spark.examples.SparkPi",
            "parameters": "10"
            }
    }' https://<databricks-instance>/api/2.0/jobs/create
    

    This returns a job-id that you can then use to run the job.

  5. Run the job using run now:

    curl -n \
    -X POST -H 'Content-Type: application/json' \
    -d '{ "job_id": <job-id> }' https://<databricks-instance>/api/2.0/jobs/run-now
    
  6. Navigate to https://<databricks-instance>/#job/<job-id> and you’ll be able to see your job running.

  7. You can also check on it from the API using the information returned from the previous request.

    curl -n https://<databricks-instance>/api/2.0/jobs/runs/get?run_id=<run-id> | jq
    

    Which should return something like:

    {
      "job_id": 35,
      "run_id": 30,
      "number_in_job": 1,
      "original_attempt_run_id": 30,
      "state": {
        "life_cycle_state": "TERMINATED",
        "result_state": "SUCCESS",
        "state_message": ""
      },
      "task": {
        "spark_jar_task": {
          "jar_uri": "",
          "main_class_name": "org.apache.spark.examples.SparkPi",
          "parameters": [
            "10"
          ],
          "run_as_repl": true
        }
      },
      "cluster_spec": {
        "new_cluster": {
          "spark_version": "4.0.x-scala2.11",
          "node_type_id": "<node-type>",
          "enable_elastic_disk": false,
          "num_workers": 1
        },
        "libraries": [
          {
            "jar": "dbfs:/docs/sparkpi.jar"
          }
        ]
      },
      "cluster_instance": {
        "cluster_id": "0412-165350-type465",
        "spark_context_id": "5998195893958609953"
      },
      "start_time": 1523552029282,
      "setup_duration": 211000,
      "execution_duration": 33000,
      "cleanup_duration": 2000,
      "trigger": "ONE_TIME",
      "creator_user_name": "...",
      "run_name": "SparkPi JAR job",
      "run_page_url": "<databricks-instance>/?o=3901135158661429#job/35/run/1",
      "run_type": "JOB_RUN"
    }
    
  8. To view the job output, visit the job run details page.

    Executing command, time = 1523552263909.
    Pi is roughly 3.13973913973914
    

Cluster log delivery examples

While you can view the Spark driver and executor logs in the Spark UI, Databricks can also deliver the logs to a DBFS destination. We provide several examples below.

Create a cluster with logs delivered to a DBFS location

The following cURL command creates a cluster named “cluster_log_dbfs” and requests Databricks to sends its logs to dbfs:/logs with the cluster ID as the prefix.

curl -n -H "Content-Type: application/json" -X POST -d @- https://<databricks-instance>/api/2.0/clusters/create <<JSON
{
  "cluster_name": "cluster_log_dbfs",
  "spark_version": "4.0.x-scala2.11",
  "node_type_id": "Standard_D3_v2",
  "num_workers": 1,
  "cluster_log_conf": {
    "dbfs": {
      "destination": "dbfs:/logs"
    }
  }
}
JSON

The response should contain the cluster ID:

{"cluster_id":"1111-223344-abc55"}

After cluster creation, Databricks syncs log files to the destination every 5 minutes. It uploads driver logs to dbfs:/logs/1111-223344-abc55/driver and executor logs to dbfs:/logs/1111-223344-abc55/executor.

Check log delivery status

Users can retrieve cluster information with log delivery status via API:

curl -n -H "Content-Type: application/json" -d @- https://<databricks-instance>/api/2.0/clusters/get <<JSON
{
  "cluster_id": "1111-223344-abc55"
}
JSON

If the latest batch of log upload was successful, the response should contain only the timestamp of the last attempt:

{
  "cluster_log_status": {
    "last_attempted": 1479338561
  }
}

In case of errors, the error message would appear in the response:

{
  "cluster_log_status": {
    "last_attempted": 1479338561,
    "last_exception": "Exception: Access Denied ..."
  }
}

Workspace API examples

Here are some examples for using Workspace API to list/import/export/delete notebooks.

List a notebook or a folder

The following cURL command lists a path in the workspace.

curl -n -H "Content-Type: application/json" -X Get -d @- https://<databricks-instance>/api/2.0/workspace/list <<JSON
{
  "path": "/Users/user@example.com/"
}
JSON

The response should contain a list of statuses:

{
  "objects": [
    {
      "object_type": "DIRECTORY",
      "path": "/Users/user@example.com/folder"
    },
    {
      "object_type": "NOTEBOOK",
      "language": "PYTHON",
      "path": "/Users/user@example.com/notebook1"
    },
    {
      "object_type": "NOTEBOOK",
      "language": "SCALA",
      "path": "/Users/user@example.com/notebook2"
    }
  ]
}

If the path is a notebook, the response contains an array containing the status of the input notebook.

Get information about a notebook or a folder

The following cURL command gets the status of a path in the workspace.

curl -n -H "Content-Type: application/json" -X Get -d @- https://<databricks-instance>/api/2.0/workspace/get-status <<JSON
{
  "path": "/Users/user@example.com/"
}
JSON

The response should contain the status of the input path:

{
  "object_type": "DIRECTORY",
  "path": "/Users/user@example.com"
}

Create a folder

The following cURL command creates a folder in the workspace. It creates the folder recursively like mkdir -p. If the folder already exists, it will do nothing and succeed.

curl -n -H "Content-Type: application/json" -X POST -d @- https://<databricks-instance>/api/2.0/workspace/mkdirs <<JSON
{
  "path": "/Users/user@example.com/new/folder"
}
JSON

If the request succeeds, an empty JSON string will be returned.

Delete a notebook or folder

The following cURL command deletes a notebook/folder in the workspace. It deletes notebook or folder. recursive can be enabled to recursively delete a non-empty folder.

curl -n -H "Content-Type: application/json" -X POST -d @- https://<databricks-instance>/api/2.0/workspace/delete <<JSON
{
  "path": "/Users/user@example.com/new/folder",
  "recursive": "false"
}
JSON

If the request succeeds, an empty json string will be returned.

Export a notebook or folder

The following cURL command export a notebook in the workspace. Notebook could be exported in different formats. Currently, it supports SOURCE, HTML, JUPYTER, DBC. Please note that a folder can only be exported as DBC.

curl -n -H "Content-Type: application/json" -X GET -d @- https://<databricks-instance>/api/2.0/workspace/export <<JSON
{
  "path": "/Users/user@example.com/notebook",
  "format": "SOURCE"
}
JSON

The response contains base64 encoded notebook content.

{
  "content": "Ly8gRGF0YWJyaWNrcyBub3RlYm9vayBzb3VyY2UKcHJpbnQoImhlbGxvLCB3b3JsZCIpCgovLyBDT01NQU5EIC0tLS0tLS0tLS0KCg=="
}

Alternatively, you can download the exported notebook directly.

curl -n -X GET "https://<databricks-instance>/api/2.0/workspace/export?format=SOURCE&direct_download=true&path=/Users/user@example.com/notebook"

The response will be the exported notebook content.

Import a notebook or directory

The following cURL command import a notebook in the workspace. Multiple formats (SOURCE, HTML, JUPYTER, DBC) are supported. If the format is SOURCE, language needs to be provided. The content parameter contains base64 encoded notebook content. overwrite can be enabled to overwrite the existing notebook.

curl -n -H "Content-Type: application/json" -X POST -d @- https://<databricks-instance>/api/2.0/workspace/import <<JSON
{
  "path": "/Users/user@example.com/new-notebook",
  "format": "SOURCE",
  "language": "SCALA",
  "content": "Ly8gRGF0YWJyaWNrcyBub3RlYm9vayBzb3VyY2UKcHJpbnQoImhlbGxvLCB3b3JsZCIpCgovLyBDT01NQU5EIC0tLS0tLS0tLS0KCg==",
  "overwrite": "false"
}
JSON

If the request succeeds, an empty JSON string will be returned.

Alternatively, you can import notebook via multipart form post.

curl -n -X POST https://<databricks-instance>/api/2.0/workspace/import \
     -F path="/Users/user@example.com/new-notebook" -F format=SOURCE -F language=SCALA -F overwrite=true -F content=@notebook.scala