Using Notebooks

A notebook is a collection of runnable cells (commands). When you use a notebook, you are primarily developing and running cells.

All notebook tasks are supported by UI actions, but you can also perform many tasks using keyboard shortcuts. Toggle the shortcut display by clicking the Keyboard Icon icon or selecting ? > Keyboard Icon.

../../_images/short-cuts.png

Develop notebooks

This section describes how to develop notebook cells and navigate around a notebook.

About notebooks

A notebook has a toolbar that lets you manage the notebook and perform actions within the notebook:

../../_images/toolbar.png

and one or more cells (or commands) that you can run:

../../_images/cmd.png

At the far right of a cell, the cell actions Cell Actions, contains three menus: Run, Dashboard, and Edit:

RunDashboardEdit

and two actions: Hide Cell Minimize and Delete Delete Cell.

Add a cell

To add a cell, mouse over a cell at the top or bottom and click the Add Cell icon, or access the notebook cell menu at the far right, click Down Caret, and select Add Cell Above or Add Cell Below.

Delete a cell

Go to the cell actions menu Cell Actions at the far right and click Delete Cell (Delete).

Mix languages

The primary language for each cell is shown in ( ) next to the notebook name:

../../_images/toolbar.png

You can override the primary language by specifying the language magic command %<language> at the beginning of a cell. The supported magic commands are: %python, %r, %scala, and %sql. Additionally:

%sh
Allows you to execute shell code in your notebook. Add the -e option in order to fail the cell (and subsequently a job or a run all command) if the shell command has a non-zero exit status.
%fs
Allows you to use dbutils filesystem commands. For more information, see Access DBFS with dbutils.
%md
Allows you to include various types of documentation, including text, images, and mathematical formulas and equations.

Include documentation

To include documentation in a notebook you can use the %md magic command to identify Markdown markup. The included Markdown markup is rendered into HTML. For example, this Markdown snippet contains markup for a level-one heading:

%md # Hello This is a Title

It is rendered as a HTML title:

../../_images/title.png

Collapsible headings

Cells that appear after cells containing Markdown headings can be collapsed into the heading cell. The following image shows a level-one heading called Heading 1 with the following two cells collapsed into it.

../../_images/headings.png

To expand and collapse headings, click the + and -.

Also see Hide and show cell content.

Display images

To display images stored in the FileStore, use the syntax:

%md
![test](files/image.png)

For example, let’s say you have the Databricks logo image file in FileStore:

dbfs ls dbfs:/FileStore/
databricks-logo-mobile.png

When you include the following code in a Markdown cell:

../../_images/image-code.png

the image is rendered in the cell:

../../_images/image-render.png

Display mathematical equations

Notebooks support KaTeX for displaying mathematical formulas and equations. For example,

%md

\\(c = \\pm\\sqrt{a^2 + b^2} \\)

\\(A{_i}{_j}=B{_i}{_j}\\)

$$c = \\pm\\sqrt{a^2 + b^2}$$

\\[A{_i}{_j}=B{_i}{_j}\\]

renders as:

../../_images/equations.png

and

%md

\\( f(\beta)= -Y_t^T X_t \beta + \sum log( 1+{e}^{X_t\bullet\beta}) + \frac{1}{2}\delta^t S_t^{-1}\delta\\)

where \\(\delta=(\beta - \mu_{t-1})\\)

renders as:

../../_images/equations2.png

Include HTML

You can include HTML in a notebook by using the function displayHTML. See HTML, D3, and SVG in Notebooks for an example of how to do this.

Note

The displayHTML iframe is served from the domain databricksusercontent.com and the iframe sandbox includes the allow-same-origin attribute. databricksusercontent.com must be accessible from your browser. If it is currently blocked by your corporate network, it will need to be whitelisted by IT.

Command comments

You can have discussions with collaborators using command comments.

To toggle the Comments sidebar, click the Comments button at the top right of a notebook.

../../_images/comments.png

To add a comment to a command:

  1. Highlight the command text and click the comment bubble:

    ../../_images/add-comment.png
  2. Add your comment and click Comment.

    ../../_images/save-comment.png

To edit, delete, or reply to a comment, click the comment and choose an action.

../../_images/edit-comment.png

Show line and command numbers

To show line numbers or command numbers, click View > Show line numbers or View > show command numbers. Once they’re displayed, you can hide them again from the same menu. You can also enable line numbers with the keyboard shortcut Control+L.

Show line or command numbers via the view menu
Line and command numbers enabled in notebook

If you enable line or command numbers, Databricks saves your preference and shows them in all of your other notebooks for that browser.

Command numbers above cells link to that specific command. If you click on the command number for a cell, it updates your URL to be anchored to that command. If you want to link to a specific command in your notebook, right-click the command number and choose copy link address.

Find and replace text

To find and replace text within a notebook, select File > Find and Replace.

../../_images/find-replace-in-dropdown.png

The current match is highlighted in orange and all other matches are highlighted in yellow.

../../_images/find-replace-example.png

You can replace matches on an individual basis by clicking Replace.

You can switch between matches by clicking the Prev and Next buttons or pressing shift+enter and enter to go to the previous and next matches, respectively.

Close the find and replace tool by clicking the x button or by pressing esc.

Autocomplete

You can use Azure Databricks autocomplete features to automatically complete code segments as you enter them in cells. This reduces what you have to remember and minimizes the amount of typing you have to do. Azure Databricks supports two types of autocomplete in your notebook: local and server.

Local autocomplete completes words that exist in the notebook. Server autocomplete is more powerful because it accesses the cluster for defined types, classes, and objects, as well as SQL database and table names. To activate server autocomplete, you must attach your attach a notebook to a cluster and run all cells that define completable objects.

Important

  • Server autocomplete in Scala, Python, and R notebooks is blocked during command execution.
  • Server autocomplete is not available for high concurrency clusters.

You trigger autocomplete by pressing Tab after entering a completable object. For example, after you define and run the cells containing the definitions of MyClass and instance, the methods of instance are completable, and a list of valid completions displays when you press Tab.

../../_images/notebook-autocomplete-object.png

Type completion and SQL database and table name completion work in the same way.

Type Completion — — SQL Completion

Run notebooks

This section describes how to run one or more notebook cells.

Requirements

The notebook must be attached to a cluster. If the cluster is not running, the cluster is started when you run one or more cells.

Run a cell

In the cell actions menu Cell Actions at the far right, click Run Icon and select Run Cell, or press shift+Enter.

Important

The maximum size for a notebook cell, both contents and output, is 16MB.

Note

By default, when you run a cell, the notebook automatically attaches to a running cluster without prompting. To change this setting, select Account Icon > User Settings > Notebook Settings.

For example, try executing these Python code snippets that reference the predefined variables.

spark
sqlContext
sc

Now that you’ve seen the pre-defined variables, run some real code:

1+1 # => 2

Run all above or below

To run all cells above or below a cell, go to the cell actions menu Cell Actions at the far right, click Run Menu, and select Run All Above or Run All Below.

Run All Below includes the cell you are in. Run All Above does not.

Run all cells

To run all the cells in a notebook, select Run All in the notebook toolbar.

Important

Do not do a Run All if steps for mount and unmount are in the same notebook. It could lead to a race condition and possibly corrupt the mount points.

Python and Scala error highlighting

Python and Scala notebooks support error highlighting. That is, the line of code that is throwing the error will be highlighted in the cell. Additionally, if the error output is a stacktrace, the cell in which the error is thrown is displayed in the stacktrace as a link to the cell. You can click this link to jump to the offending code.

../../_images/notebook-python-error-highlighting.png
../../_images/notebook-scala-error-highlighting.png

Notifications

Notifications alert you to certain events, such as which command is currently running during Run all cells and which commands are in error state. When your notebook is showing multiple error notifications, the first one will have a link that allows you to clear all notifications.

../../_images/notification.png

Notebook notifications are enabled by default. You can disable them under Account Icon > User Settings > Notebook Settings.

Run a notebook from another notebook

You can run a notebook from another notebook by using the %run <notebook> magic command. This is roughly equivalent to a :load command in a Scala REPL on your local machine or an import statement in Python. All variables defined in <notebook> become available in your current notebook.

%run must be in a cell by itself, because it runs the entire notebook inline.

Note

You cannot use %run to run a Python file and import the entities defined in that file into a notebook. To import from a Python file you must package the file into a Python library, create an Azure Databricks library from that Python library, and attach the library to the cluster you use to run your notebook.

Example

Suppose you have notebookA and notebookB. notebookA contains a cell that has the following Python code:

x = 5

Even though you did not define x in notebookB, you can access x in notebookB after you run %run notebookA.

%run /Users/path/to/notebookA

print(x) # => 5

To specify a relative path, preface it with ./ or ../. For example, if notebookA and notebookB are in the same directory, you can alternatively run them from a relative path.

%run ./notebookA

print(x) # => 5
%run ../someDirectory/notebookA # up a directory and into another

print(x) # => 5

For more complex interactions between notebooks, see Notebook Workflows.

Manage notebook state and results

After you attach a notebook to a cluster and run one or more cells, your notebook has state and displays results. This section describes how to manage notebook state and results.

Clear notebooks state and results

To clear the notebook state and results, click Clear in the notebook toolbar and select the action:

../../_images/clear-notebook.png

Download a result

You can download a cell result that contains tabular output to your local machine. Click the Download Result button at the bottom of a cell.

../../_images/download-result.png

A CSV file named export.csv is downloaded to your default download directory.

Hide and show cell content

Cell content consists of cell code and the result of running the cell. You can hide and show the cell code and result using the cell actions menu Cell Actions at the top right of the cell.

To hide cell code:

  • Click Down Caret and select Hide Code

To hide and show the cell result, do any of the following:

  • Click Down Caret and select Hide Result
  • Select Cell Minimize
  • Type Esc > Shift + o

To show hidden cell code or results, click the Show links:

../../_images/notebook-cell-show.png

See also Collapsible headings.

Notebook isolation

Notebook isolation refers to the visibility of variables and classes between notebooks. Azure Databricks supports two types of isolation:

  • Variable and class isolation
  • Spark session isolation

Note

Since all notebooks attached to the same cluster execute on the same cluster VMs, even with Spark session isolation enabled there is no guaranteed user isolation within a cluster.

Variable and class isolation

Variables and classes are available only in the current notebook. For example, two notebooks attached to the same cluster can define variables and classes with the same name, but these objects are distinct.

To define a class that is visible to all notebooks attached to the same cluster, define the class in a package cell. Then you can access the class by using its fully qualified name, which is the same as accessing a class in an attached Scala or Java library.

Spark session isolation

Every notebook attached to a cluster running Apache Spark 2.0.0 and above has a pre-defined variable called spark that represents a SparkSession. SparkSession is the entry point for using Spark APIs as well as setting runtime configurations.

Spark session isolation is enabled by default. You can also use global temporary views to share temporary views across notebooks. See Create View. To disable Spark session isolation, set spark.databricks.session.share to true in the Spark configuration.

Cells that trigger commands in other languages (that is, cells using %scala, %python, %r, and %sql) and cells that include other notebooks (that is, cells using %run) are part of the current notebook. Thus, these cells are in the same session as other notebook cells. By contrast, a notebook workflow runs a notebook with an isolated SparkSession, which means temporary views defined in such a notebook are not visible in other notebooks.

Version control

Azure Databricks has basic version control for notebooks. You can perform the following actions on revisions:

To access notebook revisions, click Revision History at the top right of the notebook toolbar.

Azure Databricks also integrates with these third-party version control tools:

Add a comment

To add a comment to the latest revision:

  1. Click the revision.

  2. Click the Save now link.

    ../../_images/revision-comment.png
  3. In the Save Notebook Revision dialog, enter a comment.

  4. Click Save. The notebook revision is saved with the entered comment.

Restore a revision

To restore a revision:

  1. Click the revision.

  2. Click Restore this revision.

    ../../_images/restore-revision.png
  3. Click Confirm. The selected revision becomes the latest revision of the notebook.

Delete a revision

To delete a notebook’s revision entry:

  1. Click the revision.

  2. Click the trash icon Trash Icon.

    ../../_images/delete-revision.png
  3. Click Yes, erase. The selected revision is deleted from the notebook’s revision history.

Clear a revision history

To clear a notebook’s revision history:

  1. Select File > Clear Revision History.

  2. Click Yes, clear. The notebook revision history is cleared.

    Warning

    Once cleared, the revision history is not recoverable.