Connecting BI Tools

You can connect business intelligence (BI) tools to Azure Databricks clusters to query data in tables. Every Azure Databricks cluster runs a JDBC/ODBC server on the driver node. This topic covers general installation and configuration instructions for most BI tools. For tool-specific connection instructions, see Business Intelligence Tools.

Requirements

To access a cluster via JDBC/ODBC you must have Can Attach To permission.

Note

If you connect to a terminated cluster using JDBC/ODBC and have Can Restart permission, the cluster will be restarted.

Step 1: Download and install a JDBC/ODBC driver

For most BI tools, you need a JDBC or ODBC driver, according to the tool’s specification, to make a connection to Azure Databricks clusters.

  1. Go to the Databricks JDBC / ODBC Driver Download page.
  2. Fill out the form and submit it. You will receive an email that includes multiple download options.
  3. In the email, select the driver that you want and download it.
  4. Install the driver. For JDBC, a JAR is provided which does not require installation. For ODBC, an installation package is provided for your chosen platform that needs to be installed on your system.
  5. Configure your BI tool to use the installed library. Depending on the tool, point it to the JAR or installed library.

Step 2: Configure JDBC/ODBC connection

Here are some of the parameters a JDBC/ODBC driver might require:

Parameters Value
Username/password See Username and password.
Host YOUR_WORKSPACE_URL
Port 443
HTTP Path See Construct the JDBC URL.
The following are usually specified in the “httpPath” for JDBC and the DSN conf for ODBC
Spark Server Type Spark Thrift Server
Schema/Database default
Authentication Mechanism (AuthMech) Username and password authentication
Thrift Transport http
SSL true
The following is for performance. Ask your vendor to change the parameter if you can’t access it
(Batch) Fetch Size 100000

Note

  • To turn off SSL, set spark.hadoop.hive.server2.use.SSL false.
  • To use binary transport, set spark.hadoop.hive.server2.transport.mode binary.

Username and password

To establish the connection, you use a personal access token to authenticate to the cluster gateway:

  • Username: token
  • Password: <personal-access-token>

Construct the JDBC URL

  1. On the cluster detail page, click the JDBC/ODBC tab. It contains the hostname, port, protocol, and HTTP path.

    JDBC/ODBC Tab
  2. Construct a JDBC connection string (URL) that looks like:

    jdbc:spark://<server-hostname>:<port>/default;transportMode=http;ssl=1;httpPath=<http-path>;AuthMech=3;UID=token;PWD=<personal-access-token>
    

    Replace <server-hostname>, <port>, and <http-path> with the values from the cluster detail page, set UID to the string token, and replace <personal-access-token> with your personal access token.

ODBC Data Source Name (DSN) configuration for the Simba ODBC driver

The Data Source Name (DSN) configuration contains the parameters for communicating with a specific database. BI tools like Tableau usually provide a friendly user interface for entering these parameters. If you have to install and manage the Simba ODBC driver yourself, you might need to create the configuration files and also allow your Driver Manager (odbc32.dll on Windows and unixODBC /iODBC on Unix) to access them.

After you download and install the Simba ODBC driver, create two files, /etc/odbc.ini and /etc/odbcinst.ini. The content in /etc/odbc.ini can be:

[Databricks-Spark-2-x]
Driver=Simba
Server=<server-hostname>
HOST=<server-hostname>
PORT=<port>
SparkServerType=3
Schema=default
ThriftTransport=2
SSL=1
AuthMech=3
UID=token
PWD=<personal-access-token>
HTTPPath=<http-path>

The content in /etc/odbcinst.ini can be:

[ODBC Drivers]
Simba = Installed
[Simba]
Driver = <driver-path>

Set <driver-path> according to the type of operating system you chose when you downloaded the driver in Step 1. For example:

  • MacOs /Library/simba/spark/lib/libsparkodbc_sbu.dylib
  • Linux /opt/simba/sparkodbc/lib/universal/libsimbasparkodbc.dylib

You can specify the paths of the two files in your environment variables so that they can be used by the Driver Manager:

export ODBCINI=/etc/odbc.ini
export ODBCSYSINI=/etc/odbcinst.ini
export SIMBASPARKINI=<simba-ini-path>/simba.sparkodbc.ini # (Contains the configuration for debugging the Simba driver)

where <simba-ini-path> is

  • MacOS /Library/simba/spark/lib
  • Linux /opt/simba/sparkodbc/lib/universal

Troubleshooting

The contents of this section have moved to our new Knowledge Base.