You can create and manage notebooks using the UI, the CLI, and by invoking the Workspace API. This topic focuses on performing notebook tasks using the UI. For the other methods, see Databricks CLI and Workspace API.
Notebooks are one interface for interacting with Azure Databricks. If you have enabled the Premium SKU, you can use Workspace access control to control sharing of notebooks and folders in the Workspace.
In this topic:
Click the Workspace button or the Home button in the sidebar. Do one of the following:
Next to any folder, click the on the right side of the text and select Create > Notebook.
In the Workspace or a user folder, select Create > Notebook.
In the Create Notebook dialog, enter a name and select the notebook’s primary language. Notebooks support Python, Scala, SQL, and R as their primary language.
You can import a notebook from a URL or a file.
Before you can do any work in a notebook, you must first attach the notebook to a cluster. In the notebook toolbar, click Detached under the notebook’s name at the top left. From the dropdown, select a running cluster.
- Add a cell
- Predefined variables
- Run a cell
- Run all cells
- Mix languages
- Include documentation
- Include HTML
- Show line and command numbers
- Python and Scala error highlighting
- Find and replace text
- Download results
- Run a notebook from another notebook
- Export a notebook
- Notebook isolation
To add a cell, mouse over a cell top or bottom and click the icon or access the notebook cell menu at the far right by and click > Add Cell Above or > Add Cell Below.
Notebooks have some Apache Spark variables already defined.
Do not create a
SQLContext. Doing so will lead to inconsistent behavior.
To run code, type the code in a cell and either select > Run Cell or press shift+Enter. For example, try executing these Python code snippets.
# A Spark Context is already created for you. # Do not create another or unspecified behavior may occur. spark
# A SQLContext is also already created for you. # Do not create another or unspecified behavior may occur. # As you can see below, the sqlContext provided is a HiveContext. sqlContext
# A Spark Context is already created for you. # Do not create another or unspecified behavior may occur. sc
Now that you’ve seen the pre-defined variables, run some real code!
1+1 # => 2
To run all the cells in a notebook, select Run All in the notebook toolbar.
Do not do a Run All if steps for mount and unmount are in the same notebook. It could lead to a race condition and possibly corrupt the mount points.
While a notebook has a primary language, you can mix languages by specifying the language magic command
%<language> at the beginning of a cell.
%<language> allows you to execute
<language> code even if that notebook’s primary language is not
<language>. The supported magic commands are:
- Allows you to execute shell code in your notebook. Add the
-eoption in order to fail this cell (and subsequently a job or a run all command) if the shell command does not success. By default,
%shalone will not fail a job even if the
%shcommand does not completely succeed. Only
%sh -ewill fail if the shell command has a non-zero exit status.
- Allows you to use Databricks Utilities filesystem commands. Read more on the Databricks File System - DBFS pages.
To include documentation in a notebook you can use the
%md magic command to identify Markdown markup. The included Markdown markup is rendered into HTML. For example, this Markdown snippet:
%md # Hello This is a Title
is rendered as a HTML title:
You can link to other notebooks or folders in Markdown cells using relative paths. Specify the
attribute of an anchor tag as the relative path, starting with a $ and then following the same
pattern as in Unix file systems:
%md <a href="$./myNotebook">Link to notebook in same folder as current notebook</a> <a href="$../myFolder">Link to folder in parent folder of current notebook</a> <a href="$./myFolder2/myNotebook2">Link to nested notebook</a>
You can include HTML in a notebook by using the function
displayHTML. See HTML, D3, and SVG in Notebooks for an example of how to do this.
To show line numbers or command numbers, click View > Show line numbers or View > show command numbers. Once they’re displayed, you can hide them again from the same menu. You can also enable line numbers with the keyboard shortcut Control+L.
If you enable line or command numbers, Databricks saves your preference and will show them in all of your other notebooks for that browser.
Command numbers above cells link to that specific command. If you click on the command number for a cell, it updates your URL to be anchored to that command. If you want to link to a specific command in your notebook, right-click the command number and choose copy link address.
Python and Scala notebooks support error highlighting. That is, the line of code that is throwing the error will be highlighted in the cell. Additionally, if the error output is a stacktrace, the cell in which the error is thrown is displayed in the stacktrace as a link to the cell. You can click this link to jump to the offending code.
To find and replace text within a notebook, select File > Find and Replace.
The current match is highlighted in orange and all other matches are highlighted in yellow.
You can replace matches on an individual basis by clicking Replace.
You can switch between matches by clicking the Prev and Next buttons or pressing shift+enter and enter to go to the previous and next matches, respectively.
Close the find and replace tool by clicking the x button or pressing esc.
Once you’ve run your code, you may want to download those results to your local machine. To do so click the button at the bottom of a cell that contains tabular output. You’ll see an option to download the preview of the results or the full results.
You can try this out by running
%sql SELECT 1
and downloading the results.
You can run a notebook from another notebook by using the
%run magic command. This is roughly equivalent to a
:load command in a Scala REPL on your local machine or an
import statement in Python. All variables defined in that other notebook become available in your current notebook.
For example, suppose you have
notebookA contains a cell that has the following Python code:
x = 5
Running this code snippet in
notebookB works even though
x was never explicitly created.
%run /Users/path/to/notebookA print(x) # => 5
To specify a relative path, preface it with
../. For example, if
notebookB are in the same directory you can alternatively run them from a relative path.
%run ./notebookA print(x) # => 5
%run ../someDirectory/notebookA # up a directory and into another print(x) # => 5
%run must be in a cell by itself as it runs the entire notebook inline.
Notifications alert you to certain events, such as which command is currently running during Run all cells and which commands are in error state. When your notebook is showing multiple error notifications, the first one will have a link that allows you to clear all notifications.
Notebook notifications are enabled by default. You can disable them under User Settings > Notebook Settings.
Databricks supports two types of isolation: variable and class and Spark session.
Since all notebooks attached to the same cluster execute on the same cluster VMs, even with Spark session isolation enabled there is no guaranteed user isolation within a cluster.
Variable and class isolation¶
Variables and classes are available only in the current notebook. For example, two notebooks attached to the same cluster can define variables and classes with the same name but these objects are distinct.
To define a class that is visible to all notebooks attached to the same cluster, define the class in a package cell. Then, you can access the class by using its fully qualified name, which is the same as accessing a class in an attached Scala or Java library.
Spark session isolation¶
For a cluster running Apache Spark 2.0.0 and above, every notebook has a pre-defined variable called
SparkSession is the entry point for using different APIs in Spark as well as
setting different runtime configurations.
For Spark 2.0.0 and Spark 2.0.1-db1, notebooks attached to a cluster share the same
SparkSession. From Spark 2.0.2-db1, you can enable Spark session isolation so that every notebook uses its own
SparkSession. When Spark session isolation is enabled:
- Runtime configurations set using
spark.conf.setor using SQL’s
setcommand affect only the current notebook. Configurations for a metastore connection are not runtime configurations and all notebooks attached to a cluster share these configurations.
- Setting the current database affects only the current notebook.
- Temporary views created by
dataset.createOrReplaceTempView, and SQL’s
CREATE TEMPORARY VIEWcommand are visible only in the current notebook.
To enable Spark session isolation, set
false in the Spark Config field.
In Spark 2.0.2-db<x> by default session isolation is disabled.
Spark 2.1 and above have session isolation enabled by default. In addition, from Spark 2.1, you can use global temporary views to share temporary views across notebooks.
Cells that trigger commands in other languages (that is, cells using
%sql) and cells that include other notebooks (that is, cells using
%run) are part of the current notebook. Thus, these cells are in the same session as other notebook cells. In contrast, Notebook Workflows run a notebook with an isolated
SparkSession, which means temporary views defined in such a notebook are not visible in other notebooks.
Databricks supports two types of autocomplete in your notebook: local and server.
Local autocomplete completes words that exist in the notebook. Server autocomplete is more powerful because it accesses the cluster for defined types, classes, and objects and SQL database and table names. To activate server autocomplete you must attach your notebook to a running cluster and run all cells that define completable objects.
- Server autocomplete in Scala, Python, and R notebooks is blocked during command execution.
- Server autocomplete is not available for serverless clusters.
You trigger autocomplete by pressing Tab after entering a completable object. For example, after you define and run the cells containing the definitions of
instance, the methods of
instance are completable and list of valid completions displays when you press Tab.
Type completion and SQL database and table name completion work in the same way.
Since notebooks are contained inside the Workspace (and in folders in the Workspace), they follow the same rules as folders. See the Access the Workspace menu for information about how to access the Workspace menu and delete notebooks or other items in the Workspace.
Databricks has basic version control for notebooks. To access version control, click the Revision History menu on the top right of every notebook. You can specify revisions with comments and those will be permanently saved. Databricks also integrates with these third-party version control tools:
- Notebook Workflows
- Package Cells