RStudio on Azure Databricks

Azure Databricks integrates with RStudio Server, the popular integrated development environment (IDE) for R.

You can use either the Open Source or Pro editions of RStudio Server on Azure Databricks. If you want to use RStudio Server Pro, you must transfer your existing RStudio pro license to Azure Databricks (see Get started with RStudio Server Pro).

Note

RStudio integration requires the Azure Databricks Premium Plan.

RStudio integration architecture

When you use RStudio Server on Azure Databricks, the RStudio Server Daemon runs on the driver (or master) node of an Azure Databricks high concurrency cluster. The RStudio web UI is proxied through Azure Databricks webapp, which means that you do not need to make any changes to your cluster network configuration. This diagram demonstrates the RStudio integration component architecture.

Architecture of RStudio on |Databricks|

Warning

Azure Databricks proxies the RStudio web service from port 8787 on the clusters’ Spark driver. This web proxy is intended for use only with RStudio. If you launch other web services on port 8787, you might expose your users to potential security exploits. Neither Databricks nor Microsoft is responsible for any issues that result from the installation of unsupported software on a cluster.

Requirements

Get started with RStudio Server Open Source

To get started with RStudio Server Open Source on Azure Databricks, you must install RStudio on a high concurrency cluster. You need to perform this installation only once. Installation is usually performed by an administrator.

Install RStudio Server Open Source

To set up RStudio Server Open Source on an Azure Databricks cluster, you must create an init script to install the RStudio Server Open Source binary package. See Cluster-scoped init scripts for more details. Here is an example notebook cell that installs an init script on a location on DBFS.

%python
script = """
  if [[ $DB_IS_DRIVER = "TRUE" ]]; then
    sudo apt-get update
    sudo apt-get install -y gdebi-core alien
    cd /tmp
    sudo wget https://download2.rstudio.org/rstudio-server-1.1.453-amd64.deb
    sudo gdebi -n rstudio-server-1.1.453-amd64.deb
    sudo rstudio-server restart
    exit 0
  else
    exit 0
  fi
"""

dbutils.fs.mkdirs("/databricks/rstudio")
dbutils.fs.put("/databricks/rstudio/rstudio-install.sh", script, True)
  1. Run the code in a notebook to install the script at dbfs:/databricks/rstudio/rstudio-install.sh
  2. Before launching a cluster add dbfs:/databricks/rstudio/rstudio-install.sh as an init script. See Cluster-scoped init scripts for details.
  3. Launch the cluster.

Use RStudio Server Open Source

  1. Display the details of the cluster on which you installed RStudio and click the Apps tab:

    Cluster Apps Tab
  2. In the Apps tab, click the Set up RStudio button. This generates a one-time password for you. Click the show link to display it and copy the password.

    RStudio One-time Password
  3. Click the Open RStudio UI link to open the UI in a new tab. Enter your username and password in the login form and sign in.

    RStudio Login Form
  4. From the RStudio UI, you can import the SparkR package and set up a SparkR session to launch Spark jobs on your high concurrency cluster.

    library(SparkR)
    sparkR.session()
    
    RStudio Session
  5. You can also attach the sparklyr package and set up a Spark connection.

    library(sparklyr)
    sparkR.session()
    sc <- spark_connect(method = "databricks")
    
    RStudio Session with sparklyr

Get started with RStudio Server Pro

Set up RStudio license server

To use RStudio Server Pro on Azure Databricks, you need to convert your Pro License to a floating license. For assistance, contact support@rstudio.com. When your license is converted, you must set up a license server for RStudio Server Pro.

To set up a license server:

  1. Launch a small instance on your cloud provider network; the license server daemon is very lightweight.
  2. Download and install the corresponding version of RStudio License Server on your instance, and start the service. For detailed instructions, see RStudio Server Pro documentation.
  3. Make sure that the license server port is open to Azure Databricks instances.

Install RStudio Server Pro

To set up RStudio Server Pro on an Azure Databricks cluster, you must create an init script to install the RStudio Server Pro binary package and configure it to use your license server for license lease. See Cluster-scoped init scripts for more details. The following is an example notebook cell that installs an init script on DBFS. The script also performs additional authentication configurations that make integration with Azure Databricks smoother.

%python

script = """
  if [[ $DB_IS_DRIVER = "TRUE" ]]; then
    sudo apt-get update
    sudo apt-get install -y gdebi-core alien

    ## Installing RStudio Server Pro
    cd /tmp
    sudo wget https://download2.rstudio.org/rstudio-server-pro-1.1.453-amd64.deb
    sudo gdebi -n rstudio-server-pro-1.1.453-amd64.deb

    ## Configuring authentication
    sudo echo 'auth-proxy=1' >> /etc/rstudio/rserver.conf
    sudo echo 'auth-proxy-user-header-rewrite=^(.*)$ $1' >> /etc/rstudio/rserver.conf
    sudo echo 'auth-proxy-sign-in-url=<domain>/login.html' >> /etc/rstudio/rserver.conf
    sudo echo 'admin-enabled=1' >> /etc/rstudio/rserver.conf
    sudo echo ‘export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin’ >> /etc/rstudio/rsession-profile

    # Enabling floating license
    sudo echo 'server-license-type=remote' >> /etc/rstudio/rserver.conf

    # Session configurations
    sudo echo 'session-rprofile-on-resume-default=1' >> /etc/rstudio/rsession.conf
    sudo echo 'allow-terminal-websockets=0' >> /etc/rstudio/rsession.conf

    sudo rstudio-server license-manager license-server <license-server-url>
    sudo rstudio-server restart
    exit 0
  else
    exit 0
  fi
"""
dbutils.fs.mkdirs("/databricks/rstudio")
dbutils.fs.put("/databricks/rstudio/rstudio-install.sh", script, True)
  1. Replace <domain> with your Azure Databricks URL and <license-server-url> with the URL of your floating license server.
  2. Run the code in a notebook to install the script at dbfs:/databricks/rstudio/rstudio-install.sh
  3. Before launching a cluster add dbfs:/databricks/rstudio/rstudio-install.sh as an init script. See Cluster-scoped init scripts for details.
  4. Launch the cluster.

Use RStudio Server Pro

  1. Display the details of the cluster on which you installed RStudio and click the Apps tab:

    Cluster Apps Tab
  2. In the Apps tab, click the Set up RStudio button.

    RStudio One-time Password
  3. You do not need the one-time password. Click the Open RStudio UI link and it will open an authenticated RStudio Pro session for you.

  4. From the RStudio UI, you can attach the SparkR package and set up a SparkR session to launch Spark jobs on your cluster.

    library(SparkR)
    sparkR.session()
    
    RStudio Session
  5. You can also attach the sparklyr package and set up a Spark connection.

    sparkR.session()
    library(sparklyr)
    sc <- spark_connect(method = "databricks")
    
    RStudio Session with sparklyr

Frequently asked questions (FAQ)

What is the difference between RStudio Server Open Source and RStudio Server Pro?

RStudio Server Pro supports a wide range of enterprise features that are not available on the Open Source edition. You can see a feature comparison on the RStudio Inc website.

In addition, RStudio Server Open Source is distributed under the GNU Affero General Public License (AGPL), while the Pro version comes with a commercial license for organizations that are not able to use AGPL software.

Finally, RStudio Server Pro comes with professional and enterprise support from RStudio Inc., while RStudio Server Open Source comes with no support.

Can I use my RStudio Server Pro license on Azure Databricks?
Yes, if you already have a Pro or Enterprise license for RStudio Server, you can use that license on Azure Databricks. See Get started with RStudio Server Pro to learn how to set up RStudio Server Pro on Azure Databricks.
Where does RStudio Server run? Do I need to manage any additional services/servers?
As you can see on the diagram in RStudio integration architecture, the RStudio Server daemon runs on the driver (master) node of your Azure Databricks high concurrency cluster. With RStudio Server Open Source, you do not need to run any additional servers/services. However, for RStudio Server Pro, you need to manage a separate instance that runs RStudio License Server.
Can I use RStudio Server on a standard cluster?
No, standard Azure Databricks clusters do not support RStudio server integration. You must run on a high concurrency cluster.
How should I persist my work on RStudio?

We strongly recommend that you persist your work using a version control system from RStudio. RStudio has great support for various version control systems and allows you to check in and manage your projects.

You can also save your files (code or data) on the Databricks File System - DBFS. For example, if you save a file under /dbfs/ the files will not be deleted when your cluster is terminated or restarted.

Important

If you do not persist your code through version control or DBFS, you risk losing your work if an admin restarts or terminates the cluster.

How does RStudio integrate with Azure Databricks R notebooks?
You can move your work between notebooks and RStudio through version control.
What is the working directory?
When you start a project in RStudio, you chose a working directory. By default this is the home directory on the driver (master) container where RStudio Server is running. You can change this directory if you want.
Can I launch Shiny Apps from RStudio running on Azure Databricks?
Unfortunately, Shiny apps and RStudio Connect integration are not yet supported on Azure Databricks.
I can’t use terminal/git inside RStudio on Azure Databricks. How can I fix that?

Make sure that you have disabled websockets. In RStudio Server Open Source, you can do this from the UI.

RStudio Session

In RStudio Server Pro, you can add allow-terminal-websockets=0 to /etc/rstudio/rsession.conf to disable websockets for all users.

I don’t see the Apps tab under cluster details.
This feature is not available to all customers. You must be on the Azure Databricks Premium Plan. In addition, the Apps tab appears only on high concurrency clusters running Databricks Runtime 4.1 and above. Standard clusters do not support Apps and RStudio Integration.