RStudio on Azure Databricks

Azure Databricks integrates with RStudio Server, the popular integrated development environment (IDE) for R.

You can use either the Open Source or Pro editions of RStudio Server on Azure Databricks. If you want to use RStudio Server Pro, you must transfer your existing RStudio pro license to Azure Databricks (see Get started with RStudio Server Pro).

Note

RStudio integration requires the Premium SKU.

RStudio integration architecture

When you use RStudio Server on Azure Databricks, the RStudio Server Daemon runs on the driver (or master) node of an Azure Databricks serverless pool. The RStudio web UI is proxied through Azure Databricks webapp, which means that you do not need to make any changes to your cluster network configuration. This diagram demonstrates the RStudio integration component architecture.

Architecture of RStudio on |Databricks|

Warning

Azure Databricks proxies the RStudio web service from port 8787 on the clusters’ Spark driver. This web proxy is intended for use only with RStudio. If you launch other web services on port 8787, you might expose your users to potential security exploits. Neither Databricks nor Microsoft is responsible for any issues that result from the installation of unsupported software on a cluster.

Requirements

  • The Premium SKU.
  • Databricks Runtime 4.1 or above.
  • A serverless pool that does not have Table ACLs enabled.
  • If you want to use the Pro edition, an RStudio Server floating Pro license.

Get started with RStudio Server Open Source

To get started with RStudio Server Open Source on Azure Databricks, you must install RStudio on a serverless pool. You need to perform this installation only once. Installation is usually performed by an administrator.

Install RStudio Server Open Source

To set up RStudio Server Open Source on a Azure Databricks serverless pool, you must create an init script to install the RStudio Server Open Source binary package. See Cluster Node Initialization Scripts for more details.

Here is an example notebook cell that installs an init script for a pool named rstudio. Run the following code in a notebook to install the script, and then restart the serverless pool.

%python

script = """
  sudo apt-get install -y gdebi-core alien
  cd /tmp
  sudo wget https://download2.rstudio.org/rstudio-server-1.1.453-amd64.deb
  sudo gdebi -n rstudio-server-1.1.453-amd64.deb
  sudo rstudio-server restart
"""

dbutils.fs.put("/databricks/init/rstudio/rstudio-install.sh", script, True)

Use RStudio Server Open Source

  1. If a serverless pool is set up with RStudio Server Open Source, you must have Can Attach To permission for that pool. The cluster admin can grant you this permission. See Cluster Access Control.

  2. Display the cluster details of the serverless pool and click the Apps tab:

    Cluster Apps Tab
  3. In the Apps tab, click the Set up RStudio button. This generates a one-time password for you. Click the show link to display it and copy the password.

    RStudio One-time Password
  4. Click the Open RStudio UI link to open the UI in a new tab. Enter your username and password in the login form and sign in.

    RStudio Login Form
  5. From the RStudio UI, you can import the SparkR package and set up a SparkR session to launch Spark jobs on your serverless pool.

    library(SparkR)
    sparkR.session()
    
    RStudio Session
  6. You can also attach the sparklyr package and set up a Spark connection.

    library(sparklyr)
    sparkR.session()
    sc <- spark_connect(method = "databricks")
    
    RStudio Session with sparklyr

Get started with RStudio Server Pro

Set up RStudio license server

To use RStudio Server Pro on Azure Databricks, you need to convert your Pro License to a floating license. For assistance, contact support@rstudio.com. When your license is converted, you must set up a license server for RStudio Server Pro.

To set up a license server:

  1. Launch a small instance on your cloud provider network; the license server daemon is very lightweight.
  2. Download and install the corresponding version of RStudio License Server on your instance, and start the service. For detailed instructions, see RStudio Server Pro documentation.
  3. Make sure that the license server port is open to Azure Databricks instances.

Install RStudio Server Pro

To set up RStudio Server Pro on a Azure Databricks serverless pool, you must create an init script to install the RStudio Server Pro binary package and configure it to use your license server for license lease. See Cluster Node Initialization Scripts for more details.

The following is an example notebook cell that installs an init script for a pool named rstudio. Replace <domain> with your Azure Databricks URL and <license_server_url> with the URL of your floating license server. This script also places additional authentication configurations that make integration with Azure Databricks smoother. Run the code in a notebook to install the script, then restart the serverless pool.

%python

script = """
  sudo apt-get install -y gdebi-core alien

  ## Installing RStudio Server Pro
  cd /tmp
  sudo wget https://download2.rstudio.org/rstudio-server-pro-1.1.453-amd64.deb
  sudo gdebi -n rstudio-server-pro-1.1.453-amd64.deb

  ## Configuring authentication
  sudo echo 'auth-proxy=1' >> /etc/rstudio/rserver.conf
  sudo echo 'auth-proxy-user-header-rewrite=^(.*)$ $1' >> /etc/rstudio/rserver.conf
  sudo echo 'auth-proxy-sign-in-url=<domain>/login.html' >> /etc/rstudio/rserver.conf
  sudo echo 'admin-enabled=1' >> /etc/rstudio/rserver.conf
  sudo echo ‘export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin’ >> /etc/rstudio/rsession-profile

  # Enabling floating license
  sudo echo 'server-license-type=remote' >> /etc/rstudio/rserver.conf

  # Session configurations
  sudo echo 'session-rprofile-on-resume-default=1' >> /etc/rstudio/rsession.conf
  sudo echo 'allow-terminal-websockets=0' >> /etc/rstudio/rsession.conf

  sudo rstudio-server license-manager license-server <license_server_url>
  sudo rstudio-server restart
"""

dbutils.fs.put("/databricks/init/rstudio/rstudio-install.sh", script, True)

Use RStudio Server Pro

  1. If a serverless pool is set up with RStudio Server Pro, you must have Can Attach To permission for that pool. The cluster admin can grant you this permission. See Cluster Access Control.

  2. Display the cluster details of the serverless pool and click the Apps tab:

    Cluster Apps Tab
  3. In the Apps tab, click the Set up RStudio button.

    RStudio One-time Password
  4. You do not need the one-time password. Click the Open RStudio UI link and it will open an authenticated RStudio Pro session for you.

  5. From the RStudio UI, you can attach the SparkR package and set up a SparkR session to launch Spark jobs on your serverless pool.

    library(SparkR)
    sparkR.session()
    
    RStudio Session
  6. You can also attach the sparklyr package and set up a Spark connection.

    sparkR.session()
    library(sparklyr)
    sc <- spark_connect(method = "databricks")
    
    RStudio Session with sparklyr

Frequently asked questions (FAQ)

What is the difference between RStudio Server Open Source and RStudio Server Pro?

RStudio Server Pro supports a wide range of enterprise features that are not available on the Open Source edition. You can see a feature comparison on the RStudio Inc website.

In addition, RStudio Server Open Source is distributed under the GNU Affero General Public License (AGPL), while the Pro version comes with a commercial license for organizations that are not able to use AGPL software.

Finally, RStudio Server Pro comes with professional and enterprise support from RStudio Inc., while RStudio Server Open Source comes with no support.

Can I use my RStudio Server Pro license on Azure Databricks?
Yes, if you already have a Pro or Enterprise license for RStudio Server, you can use that license on Azure Databricks. See Get started with RStudio Server Pro to learn how to set up RStudio Server Pro on Azure Databricks.
Where does RStudio Server run? Do I need to manage any additional services/servers?
As you can see on the diagram in RStudio integration architecture, the RStudio Server daemon runs on the driver (master) node of your Azure Databricks serverless pool. With RStudio Server Open Source, you do not need to run any additional servers/services. However, for RStudio Server Pro, you need to manage a separate instance that runs RStudio License Server.
Can I use RStudio Server on a standard cluster?
No, standard Azure Databricks clusters do not support RStudio server integration.
How should I persist my work on RStudio?

We strongly recommend that you persist your work using a version control system from RStudio. RStudio has great support for various version control systems and allows you to check in and manage your projects.

You can also save your files (code or date) on the Databricks File System - DBFS. For example, if you save a file under /dbfs/ the files will not be deleted when your serverless pool is terminated or restarted.

Important

If you do not persist your code through version control or DBFS, you risk losing your work if an admin restarts or terminates the serverless pool.

How does RStudio integrate with Azure Databricks R notebooks?
You can move your work between notebooks and RStudio through version control.
What is the working directory?
When you start a project in RStudio, you chose a working directory. By default this is the home directory on the driver (master) container where RStudio Server is running. You can change this directory if you want.
Can I launch Shiny Apps from RStudio running on Azure Databricks?
Unfortunately, Shiny apps and RStudio Connect integration are not yet supported on Azure Databricks.
I can’t use terminal/git inside RStudio on Azure Databricks. How can I fix that?

Make sure that you have disabled websockets. In RStudio Server Open Source, you can do this from the UI.

RStudio Session

In RStudio Server Pro, you can add allow-terminal-websockets=0 to /etc/rstudio/rsession.conf to disable websockets for all users.

I don’t see the Apps tab under cluster details.
This feature is not available to all customers. You must be on the Premium SKU. In addition, the Apps tab appears only on serverless pools running Databricks Runtime 4.1 and above. Standard clusters do not support Apps and RStudio Integration.