This quickstart gets you going with Azure Databricks: you create a cluster and a notebook, create a table from a dataset, query the table, and display the query results.
From the sidebar at the left and the Common Tasks list on the home page, you access fundamental Azure Databricks entities: Workspace, clusters, tables, notebooks, jobs, and libraries. The Workspace is the special root folder that stores your Azure Databricks assets, such as notebooks and libraries, and the data that you import.
To get help, click the question icon at the top right-hand corner.
A cluster is a collection of Azure Databricks computation resources. To create a cluster:
In the sidebar, click the Clusters button .
On the Clusters page, click Create Cluster.
On the New Cluster page, specify the cluster name Quickstart and select 4.2 (includes Apache Spark 2.3.1, Scala 11) in the Databricks Runtime Version drop-down.
Click Create Cluster.
A notebook is a collection of cells that run computations on a Spark cluster. To create a notebook in the Workspace:
In the sidebar, click the Workspace button .
In the Workspace folder, select Create > Notebook.
On the Create Notebook dialog, enter a name and select SQL in the Language drop-down.
Click Create. The notebook opens with an empty cell at the top.
Run a SQL statement to create a table using data from a sample CSV data file available in Azure Databricks Datasets.
Copy and paste this code snippet into the notebook cell.
DROP TABLE IF EXISTS diamonds; CREATE TABLE diamonds USING csv OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header "true")
Press SHIFT + ENTER. The notebook automatically attaches to the cluster you created in Step 2, creates the table, loads the data, and returns
Run a SQL statement to query the table for the average diamond price by color.
To add a cell to the notebook, mouse over the cell bottom and click the icon.
Copy this snippet and paste it in the cell.
SELECT color, avg(price) AS price FROM diamonds GROUP BY color ORDER BY COLOR
Press SHIFT + ENTER. The notebook displays a table of diamond color and average price.
Display a chart of the average diamond price by color.
Click the Bar chart icon .
Click Plot Options.
Drag color into the Keys box.
Drag price into the Values box.
In the Aggregation drop-down, select AVG.
Click Apply to display the bar chart.
We’ve now covered the basics of Azure Databricks, including creating a cluster and a notebook, running SQL commands in the notebook, and displaying results.
To dive into various Apache Spark topics, see Apache Spark Getting Started.
To read more about the primary tools you use and tasks you can perform with the Azure Databricks workspace, see:
To see some interesting applications of the Azure Databricks workspace, watch these videos: