Spark Configuration

Spark configuration properties

To fine tune Spark jobs, you can provide custom Spark configuration properties in a cluster configuration.

On the cluster configuration page, scroll down to the Spark tab and add your configuration.

../../_images/spark-config-azure.png

When you configure a cluster using the Clusters API, set Spark properties in the spark_conf field in the Create cluster request or Edit cluster request.

To set Spark properties for all clusters, create a global init script:

%scala
dbutils.fs.put("dbfs:/databricks/init/set_spark_params.sh","""
  |#!/bin/bash
  |
  |cat << 'EOF' > /databricks/driver/conf/00-custom-spark-driver-defaults.conf
  |[driver] {
  |  "spark.sql.sources.partitionOverwriteMode" = "DYNAMIC"
  |}
  |EOF
  """.stripMargin, true)

Environment variables

You can set environment variables that you can access from scripts running on a cluster. Set environment variables in the spark_env_vars field in the Create cluster request or Edit cluster request.

../../_images/environment-variables.png

Note

The environment variables you set in this field are not available in Cluster Node Initialization Scripts. Init scripts support only a limited set of pre-defined Environment variables.