Databricks Runtime Maintenance Updates

This page lists maintenance updates issued for Databricks Runtime releases. To add a maintenance update to an existing cluster, restart the cluster.

Databricks Runtime 5.5 LTS

See Databricks Runtime 5.5 LTS.

  • Sep 24, 2019
    • Improved stability of Parquet writer.
    • Fixed the problem that Thrift query cancelled before it starts executing may stuck in STARTED state.
  • Sep 10, 2019
    • Add thread safe iterator to BytesToBytesMap
    • [SPARK-27992][SPARK-28881]Allow Python to join with connection thread to propagate errors
    • Fixed a bug affecting certain global aggregation queries.
    • Improved credential redaction.
    • [SPARK-27330][SS] support task abort in foreach writer
    • [SPARK-28642]Hide credentials in SHOW CREATE TABLE
    • [SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
  • Aug 27, 2019
    • [SPARK-20906][SQL]Allow user-specified schema in the API to_avro with schema registry
    • [SPARK-27838][SQL] Support user provided non-nullable avro schema for nullable catalyst schema without any null record
    • Improvement on Delta Lake Time Travel
    • Fixed an issue affecting certain transform expression
    • Supports broadcast variables when Process Isolation is enabled
  • Aug 13, 2019
    • Delta streaming source should check the latest protocol of a table
    • [SPARK-28260]Add CLOSED state to ExecutionState
    • [SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
  • Jul 30, 2019
    • [SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
    • [SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
    • [SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
    • [SPARK-28355][CORE][PYTHON] Use Spark conf for threshold at which UDF is compressed by broadcast

Databricks Runtime 5.4 ML

See Databricks Runtime 5.4 ML.

  • Jun 18, 2019
    • Improved handling of MLflow active runs in Hyperopt integration
    • Improved messages in Hyperopt
    • Updated package markdown from 3.1 to 3.1.1

Databricks Runtime 5.4

See Databricks Runtime 5.4.

  • Sep 10, 2019
    • Add thread safe iterator to BytesToBytesMap
    • Fixed a bug affecting certain global aggregation queries.
    • [SPARK-27330][SS] support task abort in foreach writer
    • [SPARK-28642]Hide credentials in SHOW CREATE TABLE
    • [SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
    • [SPARK-28699][CORE] Fix a corner case for aborting indeterminate stage
  • Aug 27, 2019
    • Fixed an issue affecting certain transform expressions
  • Aug 13, 2019
    • Delta streaming source should check the latest protocol of a table
    • [SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
  • Jul 30, 2019
    • [SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
    • [SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
    • [SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
  • Jul 2, 2019
    • Upgraded snappy-java from 1.1.7.1 to 1.1.7.3.
  • Jun 18, 2019
    • Improved handling of MLflow active runs in MLlib integration
    • Improved Databricks Advisor message related to using Delta cache
    • Fixed a bug affecting using higher order functions
    • Fixed a bug affecting Delta metadata queries

Databricks Runtime 5.3

See Databricks Runtime 5.3.

  • Sep 10, 2019
    • Add thread safe iterator to BytesToBytesMap
    • Fixed a bug affecting certain global aggregation queries.
    • [SPARK-27330][SS] support task abort in foreach writer
    • [SPARK-28642]Hide credentials in SHOW CREATE TABLE
    • [SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
    • [SPARK-28699][CORE] Fix a corner case for aborting indeterminate stage
  • Aug 27, 2019
    • Fixed an issue affecting certain transform expressions
  • Aug 13, 2019
    • Delta streaming source should check the latest protocol of a table
    • [SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
  • Jul 30, 2019
    • [SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
    • [SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
    • [SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
  • Jun 18, 2019
    • Improved Databricks Advisor message related to using Delta cache
    • Fixed a bug affecting using higher order functions
    • Fixed a bug affecting Delta metadata queries
  • May 28, 2019
    • Improved the stability of Delta
    • Tolerate IOExceptions when reading Delta LAST_CHECKPOINT file - Added recovery to failed library installation
  • May 7, 2019
    • Port HADOOP-15778 (ABFS: Fix client side throttling for read) to Azure Data Lake Storage Gen2 connector
    • Port HADOOP-16040 (ABFS: Bug fix for tolerateOobAppends configuration) to Azure Data Lake Storage Gen2 connector
    • Fixed a bug affecting table ACLs
    • Renamed fs.s3a.requesterPays.enabled to fs.s3a.requester-pays.enabled
    • Fixed a race condition when loading a Delta log checksum file
    • Fixed Delta conflict detection logic to not identify “insert + overwrite” as pure “append” operation
    • Ensure that DBIO cache is not disabled when Table ACLs are enabled
    • [SPARK-27494][SS] Null keys/values don’t work in Kafka source v2
    • [SPARK-27446][R] Use existing spark conf if available.
    • [SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images
    • [SPARK-27160][SQL] Fix DecimalType when building orc filters
    • [SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager

Databricks Runtime 5.2

See Databricks Runtime 5.2.

  • Sep 10, 2019

    • Add thread safe iterator to BytesToBytesMap
    • Fixed a bug affecting certain global aggregation queries.
    • [SPARK-27330][SS] support task abort in foreach writer
    • [SPARK-28642]Hide credentials in SHOW CREATE TABLE
    • [SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
    • [SPARK-28699][CORE] Fix a corner case for aborting indeterminate stage
  • Aug 27, 2019

    • Fixed an issue affecting certain transform expressions
  • Aug 13, 2019

    • Delta streaming source should check the latest protocol of a table
    • [SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
  • Jul 30, 2019

    • [SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
    • [SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
    • [SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
  • Jul 2, 2019

    • Tolerate IOExceptions when reading Delta LAST_CHECKPOINT file
  • Jun 18, 2019

    • Improved Databricks Advisor message related to using Delta cache
    • Fixed a bug affecting using higher order functions
    • Fixed a bug affecting Delta metadata queries
  • May 28, 2019

    • Added recovery to failed library installation
  • May 7, 2019

    • Port HADOOP-15778 (ABFS: Fix client side throttling for read) to Azure Data Lake Storage Gen2 connector
    • Port HADOOP-16040 (ABFS: Bug fix for tolerateOobAppends configuration) to Azure Data Lake Storage Gen2 connector
    • Fixed a race condition when loading a Delta log checksum file
    • Fixed Delta conflict detection logic to not identify “insert + overwrite” as pure “append” operation
    • Ensure that DBIO cache is not disabled when Table ACLs are enabled
    • [SPARK-27494][SS] Null keys/values don’t work in Kafka source v2
    • [SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images
    • [SPARK-27160][SQL] Fix DecimalType when building orc filters
    • [SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager
  • Mar 26, 2019

    • Avoid embedding platform-dependent offsets literally in whole-stage generated code
    • [SPARK-26665][CORE] Fix a bug that BlockTransferService.fetchBlockSync may hang forever.
    • [SPARK-27134][SQL] array_distinct function does not work correctly with columns containing array of array.
    • [SPARK-24669][SQL] Invalidate tables in case of DROP DATABASE CASCADE.
    • [SPARK-26572][SQL] fix aggregate codegen result evaluation.
    • Fixed a bug affecting certain PythonUDFs.
  • Feb 26, 2019

    • [SPARK-26864][SQL] Query may return incorrect result when python udf is used as a left-semi join condition.
    • [SPARK-26887][PYTHON] Create datetime.date directly instead of creating datetime64 as intermediate data.
    • Fixed a bug affecting JDBC/ODBC server.
    • Fixed a bug affecting PySpark.
    • Exclude the hidden files when building HadoopRDD.
    • Fixed a bug in Delta that caused serialization issues.
  • Feb 12, 2019

    • Fixed an issue affecting using Delta with Azure ADLS Gen2 mount points.
    • Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when spark.network.crypto.enabled is set to true).
  • Jan 30, 2019

    • Fixed the StackOverflowError when putting skew join hint on cached relation.
    • Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
    • [SPARK-26706][SQL] Fix illegalNumericPrecedence for ByteType.
    • [SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
    • CSV/JSON data sources should avoid globbing paths when inferring schema.
    • Fixed constraint inference on Window operator.
    • Fixed an issue affecting installing egg libraries with clusters having table ACL enabled.

Databricks Runtime 5.1

See Databricks Runtime 5.1.

  • Aug 13, 2019

    • Delta streaming source should check the latest protocol of a table
    • [SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
  • Jul 30, 2019

    • [SPARK-28015][SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats
    • [SPARK-28308][CORE] CalendarInterval sub-second part should be padded before parsing
    • [SPARK-27485]EnsureRequirements.reorder should handle duplicate expressions gracefully
  • Jul 2, 2019

    • Tolerate IOExceptions when reading Delta LAST_CHECKPOINT file
  • Jun 18, 2019

    • Fixed a bug affecting using higher order functions
    • Fixed a bug affecting Delta metadata queries
  • May 28, 2019

    • Added recovery to failed library installation
  • May 7, 2019

    • Port HADOOP-15778 (ABFS: Fix client side throttling for read) to Azure Data Lake Storage Gen2 connector
    • Port HADOOP-16040 (ABFS: Bug fix for tolerateOobAppends configuration) to Azure Data Lake Storage Gen2 connector
    • Fixed a race condition when loading a Delta log checksum file
    • Fixed Delta conflict detection logic to not identify “insert + overwrite” as pure “append” operation
    • [SPARK-27494][SS] Null keys/values don’t work in Kafka source v2
    • [SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images
    • [SPARK-27160][SQL] Fix DecimalType when building orc filters
    • [SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager
  • Mar 26, 2019

    • Avoid embedding platform-dependent offsets literally in whole-stage generated code
    • Fixed a bug affecting certain PythonUDFs.
  • Feb 26, 2019

    • [SPARK-26864][SQL] Query may return incorrect result when python udf is used as a left-semi join condition.
    • Fixed a bug affecting JDBC/ODBC server.
    • Exclude the hidden files when building HadoopRDD.
  • Feb 12, 2019

    • Fixed an issue affecting installing egg libraries with clusters having table ACL enabled.
    • Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
    • [SPARK-26706][SQL] Fix illegalNumericPrecedence for ByteType.
    • [SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
    • Fixed constraint inference on Window operator.
    • Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when spark.network.crypto.enabled is set to true).
  • Jan 30, 2019

    • Fixed an issue that can cause df.rdd.count() with UDT to return incorrect answer for certain cases.
    • Fixed an issue affecting installing wheelhouses.
    • [SPARK-26267]Retry when detecting incorrect offsets from Kafka.
    • Fixed a bug that affects multiple file stream sources in a streaming query.
    • Fixed the StackOverflowError when putting skew join hint on cached relation.
    • Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
  • Jan 8, 2019

    • Fixed issue that causes the error org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted.
    • [SPARK-26352]join reordering should not change the order of output attributes.
    • [SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
    • Stability improvement for Delta Lake.
    • Delta Lake is enabled.
    • Fixed the issue that caused failed Azure Data Lake Storage Gen2 access when Azure AD Credential Passthrough is enabled for Azure Data Lake Storage Gen1.
    • Databricks IO Cache is now enabled for Ls series worker instance types for all pricing tiers.

Databricks Runtime 5.0 (unsupported)

See Databricks Runtime 5.0.

  • Jun 18, 2019

    • Fixed a bug affecting using higher order functions
  • May 7, 2019

    • Fixed a race condition when loading a Delta log checksum file
    • Fixed Delta conflict detection logic to not identify “insert + overwrite” as pure “append” operation
    • [SPARK-27494][SS] Null keys/values don’t work in Kafka source v2
    • [SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images
    • [SPARK-27160][SQL] Fix DecimalType when building orc filters - [SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager
  • Mar 26, 2019

    • Avoid embedding platform-dependent offsets literally in whole-stage generated code
    • Fixed a bug affecting certain PythonUDFs.
  • Mar 12, 2019

    • [SPARK-26864][SQL] Query may return incorrect result when python udf is used as a left-semi join condition.
  • Feb 26, 2019

    • Fixed a bug affecting JDBC/ODBC server.
    • Exclude the hidden files when building HadoopRDD.
  • Feb 12, 2019

    • Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
    • [SPARK-26706][SQL] Fix illegalNumericPrecedence for ByteType.
    • [SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
    • Fixed constraint inference on Window operator.
    • Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when spark.network.crypto.enabled is set to true).
  • Jan 30, 2019

    • Fixed an issue that can cause df.rdd.count() with UDT to return incorrect answer for certain cases.
    • [SPARK-26267]Retry when detecting incorrect offsets from Kafka.
    • Fixed a bug that affects multiple file stream sources in a streaming query.
    • Fixed the StackOverflowError when putting skew join hint on cached relation.
    • Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
  • Jan 8, 2019

    • Fixed issue that caused the error org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted.
    • [SPARK-26352]join reordering should not change the order of output attributes.
    • [SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
    • Stability improvement for Delta Lake.
    • Delta Lake is enabled.
    • Databricks IO Cache is now enabled for Ls series worker instance types for all pricing tiers.
  • Dec 18, 2018

    • [SPARK-26293]Cast exception when having Python UDF in subquery
    • Fixed an issue affecting certain queries using Join and Limit.
    • Redacted credentials from RDD names in Spark UI
  • Dec 6, 2018

    • Fixed an issue that caused incorrect query result when using orderBy followed immediately by groupBy with group-by key as the leading part of the sort-by key.
    • Upgraded Snowflake Connector for Spark from 2.4.9.2-spark_2.4_pre_release to 2.4.10.
    • Only ignore corrupt files after one or more retries when spark.sql.files.ignoreCorruptFiles or spark.sql.files.ignoreMissingFiles flag is enabled.
    • Fixed an issue affecting certain self union queries.
    • Fixed a bug with the thrift server where sessions are sometimes leaked when cancelled.
    • [SPARK-26307]Fixed CTAS when INSERT a partitioned table using Hive SerDe.
    • [SPARK-26147]Python UDFs in join condition fail even when using columns from only one side of join
    • [SPARK-26211]Fix InSet for binary, and struct and array with null.
    • [SPARK-26181]the hasMinMaxStats method of ColumnStatsMap is not correct.
    • Fixed an issue affecting installing Python Wheels in environments without Internet access.
  • Nov 20, 2018

    • Fixed an issue that caused a notebook not usable after cancelling a streaming query.
    • Fixed an issue affecting certain queries using window functions.
    • Fixed an issue affecting a stream from Delta with multiple schema changes.
    • Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.

Databricks Runtime 4.3 (unsupported)

See Databricks Runtime 4.3.

  • Apr 9, 2019

    • [SPARK-26665][CORE] Fix a bug that can cause BlockTransferService.fetchBlockSync to hang forever.
    • [SPARK-24669][SQL] Invalidate tables in case of DROP DATABASE CASCADE.
  • Mar 12, 2019

    • Fixed a bug affecting code generation.
    • Fixed a bug affecting Delta.
  • Feb 26, 2019

    • Fixed a bug affecting JDBC/ODBC server.
  • Feb 12, 2019

    • [SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
    • Excluding the hidden files when building HadoopRDD.
    • Fixed Parquet Filter Conversion for IN predicate when its value is empty.
    • Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when spark.network.crypto.enabled is set to true).
  • Jan 30, 2019

    • Fixed an issue that can cause df.rdd.count() with UDT to return incorrect answer for certain cases.
    • Fixed the inconsistency between a SQL cache’s cached RDD and its physical plan, which causes incorrect result.
  • Jan 8, 2019

    • Fixed the issue that causes the error org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted.
    • Redacted credentials from RDD names in Spark UI
    • [SPARK-26352]join reordering should not change the order of output attributes.
    • [SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
    • Delta Lake is enabled.
    • Databricks IO Cache is now enabled for Ls series worker instance types for all pricing tiers.
  • Dec 18, 2018

    • [SPARK-25002]Avro: revise the output record namespace.
    • Fixed an issue affecting certain queries using Join and Limit.
    • [SPARK-26307]Fixed CTAS when INSERT a partitioned table using Hive SerDe.
    • Only ignore corrupt files after one or more retries when spark.sql.files.ignoreCorruptFiles or spark.sql.files.ignoreMissingFiles flag is enabled.
    • [SPARK-26181]the hasMinMaxStats method of ColumnStatsMap is not correct.
    • Fixed an issue affecting installing Python Wheels in environments without Internet access.
    • Fixed a performance issue in query analyzer.
    • Fixed an issue in PySpark that caused DataFrame actions failed with “connection refused” error.
    • Fixed an issue affecting certain self union queries.
  • Nov 20, 2018

    • [SPARK-17916][SPARK-25241]Fix empty string being parsed as null when nullValue is set.
    • [SPARK-25387]Fix for NPE caused by bad CSV input.
    • Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.
  • Nov 6, 2018

    • [SPARK-25741]Long URLs are not rendered properly in web UI.
    • [SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
    • Fixed an issue affecting temporary objects cleanup in SQL Data Warehouse connector.
    • [SPARK-25816]Fix attribute resolution in nested extractors.
  • Oct 16, 2018
    • Fixed a bug affecting the output of running SHOW CREATE TABLE on Delta tables.
    • Fixed a bug affecting Union operation.
  • Sep 25, 2018
    • [SPARK-25368][SQL] Incorrect constraint inference returns wrong result.
    • [SPARK-25402][SQL] Null handling in BooleanSimplification.
    • Fixed NotSerializableException in Avro data source.
  • Sep 11, 2018
    • [SPARK-25214][SS] Fix the issue that Kafka v2 source may return duplicated records when failOnDataLoss=false.
    • [SPARK-24987][SS] Fix Kafka consumer leak when no new offsets for TopicPartition.
    • Filter reduction should handle null value correctly.
    • Improved stability of execution engine.
  • Aug 28, 2018
    • Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
    • [SPARK-25142]Add error messages when Python worker could not open socket in _load_from_socket.
  • Aug 23, 2018
    • [SPARK-23935]mapEntry throws org.codehaus.commons.compiler.CompileException.
    • Fixed nullable map issue in Parquet reader.
    • [SPARK-25051][SQL] FixNullability should not stop on AnalysisBarrier.
    • [SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
    • Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
    • [SPARK-25084]”distribute by” on multiple columns (wrap in brackets) may lead to codegen issue.
    • [SPARK-25096]Loosen nullability if the cast is force-nullable.
    • Lowered the default number of threads used by the Delta Lake Optimize command, reducing memory overhead and committing data faster.
    • [SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
    • Fixed secret manager redaction when command partially succeed.

Databricks Runtime 4.2 (unsupported)

See Databricks Runtime 4.2.

  • Feb 26, 2019

    • Fixed a bug affecting JDBC/ODBC server.
  • Feb 12, 2019

    • [SPARK-26709][SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.
    • Excluding the hidden files when building HadoopRDD.
    • Fixed Parquet Filter Conversion for IN predicate when its value is empty.
    • Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when spark.network.crypto.enabled is set to true).
  • Jan 30, 2019

    • Fixed an issue that can cause df.rdd.count() with UDT to return incorrect answer for certain cases.
  • Jan 8, 2019

    • Fixed issue that causes the error org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted.
    • Redacted credentials from RDD names in Spark UI
    • [SPARK-26352]join reordering should not change the order of output attributes.
    • [SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
    • Delta Lake is enabled.
    • Databricks IO Cache is now enabled for Ls series worker instance types for all pricing tiers.
  • Dec 18, 2018

    • [SPARK-25002]Avro: revise the output record namespace.
    • Fixed an issue affecting certain queries using Join and Limit.
    • [SPARK-26307]Fixed CTAS when INSERT a partitioned table using Hive SerDe.
    • Only ignore corrupt files after one or more retries when spark.sql.files.ignoreCorruptFiles or spark.sql.files.ignoreMissingFiles flag is enabled.
    • [SPARK-26181]the hasMinMaxStats method of ColumnStatsMap is not correct.
    • Fixed an issue affecting installing Python Wheels in environments without Internet access.
    • Fixed a performance issue in query analyzer.
    • Fixed an issue in PySpark that caused DataFrame actions failed with “connection refused” error.
    • Fixed an issue affecting certain self union queries.
  • Nov 20, 2018

    • [SPARK-17916][SPARK-25241]Fix empty string being parsed as null when nullValue is set.
    • Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.
  • Nov 6, 2018

    • [SPARK-25741]Long URLs are not rendered properly in web UI.
    • [SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
  • Oct 16, 2018
    • Fixed a bug affecting the output of running SHOW CREATE TABLE on Delta tables.
    • Fixed a bug affecting Union operation.
  • Sep 25, 2018
    • [SPARK-25368][SQL] Incorrect constraint inference returns wrong result.
    • [SPARK-25402][SQL] Null handling in BooleanSimplification.
    • Fixed NotSerializableException in Avro data source.
  • Sep 11, 2018
    • [SPARK-25214][SS] Fix the issue that Kafka v2 source may return duplicated records when failOnDataLoss=false.
    • [SPARK-24987][SS] Fix Kafka consumer leak when no new offsets for TopicPartition.
    • Filter reduction should handle null value correctly.
  • Aug 28, 2018
    • Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
  • Aug 23, 2018
    • Fixed NoClassDefError for Delta Snapshot
    • [SPARK-23935]mapEntry throws org.codehaus.commons.compiler.CompileException.
    • [SPARK-24957][SQL] Average with decimal followed by aggregation returns wrong result. The incorrect results of AVERAGE might be returned. The CAST added in the Average operator will be bypassed if the result of Divide is the same type which it is casted to.
    • [SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
    • Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
    • [SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
    • [SPARK-25084]”distribute by” on multiple columns (wrap in brackets) may lead to codegen issue.
    • [SPARK-24934][SQL] Explicitly whitelist supported types in upper/lower bounds for in-memory partition pruning. When complex data types are used in query filters against cached data, Spark always returns an empty result set. The in-memory stats-based pruning generates incorrect results, because null is set for upper/lower bounds for complex types. The fix is to not use in-memory stats-based pruning for complex types.
    • Fixed secret manager redaction when command partially succeed.
    • Fixed nullable map issue in Parquet reader.
  • Aug 2, 2018
    • Added writeStream.table API in Python.
    • Fixed an issue affecting Delta checkpointing.
    • [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter. SQL cache is not being used when using DataFrameWriter to write a DataFrame with UDF. This is a regression caused by the changes we made in AnalysisBarrier, since not all the Analyzer rules are idempotent.
    • Fixed an issue that could cause mergeInto command to produce incorrect results.
    • Improved stability on accessing Azure Data Lake Storage Gen1.
    • [SPARK-24809]Serializing LongHashedRelation in executor may result in data error.
    • [SPARK-24878][SQL] Fix reverse function for array type of primitive type containing null.
  • July 11, 2018
    • Fixed a bug in query execution that would cause aggregations on decimal columns with different precisions to return incorrect results in some cases.
    • Fixed a NullPointerException bug that was thrown during advanced aggregation operations like grouping sets.

Databricks Runtime 4.1 ML (unsupported)

See Databricks Runtime 4.1 ML (Beta).

  • July 31, 2018
    • Added Azure SQL DW connector to ML Runtime 4.1
    • Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
    • Fixed a bug affecting Spark SQL execution engine.
    • Fixed a bug affecting code generation.
    • Fixed a bug (java.lang.NoClassDefFoundError) affecting Delta Lake.
    • Improved error handling in Delta Lake.
    • Fixed a bug that caused incorrect data skipping statistics to be collected for string columns 32 characters or greater.

Databricks Runtime 4.1 (unsupported)

See Databricks Runtime 4.1.

  • Jan 8, 2019
    • [SPARK-26366]ReplaceExceptWithFilter should consider NULL as False.
    • Delta Lake is enabled.
  • Dec 18, 2018
    • [SPARK-25002]Avro: revise the output record namespace.
    • Fixed an issue affecting certain queries using Join and Limit.
    • [SPARK-26307]Fixed CTAS when INSERT a partitioned table using Hive SerDe.
    • Only ignore corrupt files after one or more retries when spark.sql.files.ignoreCorruptFiles or spark.sql.files.ignoreMissingFiles flag is enabled.
    • Fixed an issue affecting installing Python Wheels in environments without Internet access.
    • Fixed an issue in PySpark that caused DataFrame actions failed with “connection refused” error.
    • Fixed an issue affecting certain self union queries.
  • Nov 20, 2018
    • [SPARK-17916][SPARK-25241]Fix empty string being parsed as null when nullValue is set.
    • Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.
  • Nov 6, 2018
    • [SPARK-25741]Long URLs are not rendered properly in web UI.
    • [SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
  • Oct 16, 2018
    • Fixed a bug affecting the output of running SHOW CREATE TABLE on Delta tables.
    • Fixed a bug affecting Union operation.
  • Sep 25, 2018
    • [SPARK-25368][SQL] Incorrect constraint inference returns wrong result.
    • [SPARK-25402][SQL] Null handling in BooleanSimplification.
    • Fixed NotSerializableException in Avro data source.
  • Sep 11, 2018
    • [SPARK-25214][SS] Fix the issue that Kafka v2 source may return duplicated records when failOnDataLoss=false.
    • [SPARK-24987][SS] Fix Kafka consumer leak when no new offsets for TopicPartition.
    • Filter reduction should handle null value correctly.
  • Aug 28, 2018
    • Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
    • [SPARK-25084]”distribute by” on multiple columns (wrap in brackets) may lead to codegen issue.
    • [SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
  • Aug 23, 2018
    • Fixed NoClassDefError for Delta Snapshot.
    • [SPARK-24957][SQL] Average with decimal followed by aggregation returns wrong result. The incorrect results of AVERAGE might be returned. The CAST added in the Average operator will be bypassed if the result of Divide is the same type which it is casted to.
    • Fixed nullable map issue in Parquet reader.
    • [SPARK-24934][SQL] Explicitly whitelist supported types in upper/lower bounds for in-memory partition pruning. When complex data types are used in query filters against cached data, Spark always returns an empty result set. The in-memory stats-based pruning generates incorrect results, because null is set for upper/lower bounds for complex types. The fix is to not use in-memory stats-based pruning for complex types.
    • [SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
    • Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
    • Fixed secret manager redaction when command partially succeed
  • Aug 2, 2018
    • [SPARK-24613][SQL] Cache with UDF could not be matched with subsequent dependent caches. Wraps the logical plan with a AnalysisBarrier for execution plan compilation in CacheManager, in order to avoid the plan being analyzed again. This is also a regression of Spark 2.3.
    • Fixed a SQL Data Warehouse connector issue affecting timezone conversion for writing DateType data.
    • Fixed an issue affecting Delta checkpointing.
    • Fixed an issue that could cause mergeInto command to produce incorrect results.
    • [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter. SQL cache is not being used when using DataFrameWriter to write a DataFrame with UDF. This is a regression caused by the changes we made in AnalysisBarrier, since not all the Analyzer rules are idempotent.
    • [SPARK-24809]Serializing LongHashedRelation in executor may result in data error.
  • July 11, 2018
    • Fixed a bug in query execution that would cause aggregations on decimal columns with different precisions to return incorrect results in some cases.
    • Fixed a NullPointerException bug that was thrown during advanced aggregation operations like grouping sets.
  • June 28, 2018
    • Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
  • June 7, 2018
    • Fixed a bug affecting Spark SQL execution engine.
    • Fixed a bug affecting code generation.
    • Fixed a bug (java.lang.NoClassDefFoundError) affecting Delta Lake.
    • Improved error handling in Delta Lake.
  • May 17, 2018
    • Fixed a bug that caused incorrect data skipping statistics to be collected for string columns 32 characters or greater.

Databricks Runtime 4.0 (unsupported)

See Databricks Runtime 4.0.

  • Nov 6, 2018
    • [SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
  • Oct 16, 2018
    • Fixed a bug affecting Union operation.
  • Sep 25, 2018
    • [SPARK-25368][SQL] Incorrect constraint inference returns wrong result.
    • [SPARK-25402][SQL] Null handling in BooleanSimplification.
    • Fixed NotSerializableException in Avro data source.
  • Sep 11, 2018
    • Filter reduction should handle null value correctly.
  • Aug 28, 2018
    • Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
  • Aug 23, 2018
    • Fixed nullable map issue in Parquet reader.
    • Fixed secret manager redaction when command partially succeed
    • Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
    • [SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
    • [SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
  • Aug 2, 2018
    • [SPARK-24452]Avoid possible overflow in int add or multiple.
    • [SPARK-24588]Streaming join should require HashClusteredPartitioning from children.
    • Fixed an issue that could cause mergeInto command to produce incorrect results.
    • [SPARK-24867][SQL] Add AnalysisBarrier to DataFrameWriter. SQL cache is not being used when using DataFrameWriter to write a DataFrame with UDF. This is a regression caused by the changes we made in AnalysisBarrier, since not all the Analyzer rules are idempotent.
    • [SPARK-24809]Serializing LongHashedRelation in executor may result in data error.
  • June 28, 2018
    • Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
  • June 7, 2018
    • Fixed a bug affecting Spark SQL execution engine.
    • Improved error handling in Delta Lake.
  • May 17, 2018
    • Bug fixes for Databricks secret management.
    • Improved stability on reading data stored in Azure Data Lake Store.
    • Fixed a bug affecting RDD caching.
    • Fixed a bug affecting Null-safe Equal in Spark SQL.
  • Apr 24, 2018
    • Upgraded Azure Data Lake Store SDK from 2.0.11 to 2.2.8 to improve the stability of access to Azure Data Lake Store.
    • Fixed a bug affecting the insertion of overwrites to partitioned Hive tables when spark.databricks.io.hive.fastwriter.enabled is false.
    • Fixed an issue that failed task serialization.
    • Improved Delta Lake stability.
  • Mar 14, 2018
    • Prevent unnecessary metadata updates when writing into Delta Lake.
    • Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files.

Databricks Runtime 3.5 LTS

See Databricks Runtime 3.5 LTS.

  • Sep 10, 2019
    • [SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case
  • Apr 9, 2019
    • [SPARK-26665][CORE] Fix a bug that can cause BlockTransferService.fetchBlockSync to hang forever.
  • Feb 12, 2019
    • Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (when spark.network.crypto.enabled is set to true).
  • Jan 30, 2019
    • Fixed an issue that can cause df.rdd.count() with UDT to return incorrect answer for certain cases.
  • Dec 18, 2018
    • Only ignore corrupt files after one or more retries when spark.sql.files.ignoreCorruptFiles or spark.sql.files.ignoreMissingFiles flag is enabled.
    • Fixed an issue affecting certain self union queries.
  • Nov 20, 2018
  • Nov 6, 2018
    • [SPARK-25714]Fix Null Handling in the Optimizer rule BooleanSimplification.
  • Oct 16, 2018
    • Fixed a bug affecting Union operation.
  • Sep 25, 2018
    • [SPARK-25402][SQL] Null handling in BooleanSimplification.
    • Fixed NotSerializableException in Avro data source.
  • Sep 11, 2018
    • Filter reduction should handle null value correctly.
  • Aug 28, 2018
    • Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.
    • [SPARK-25114]Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX_VALUE.
  • Aug 23, 2018
    • [SPARK-24809]Serializing LongHashedRelation in executor may result in data error.
    • Fixed nullable map issue in Parquet reader.
    • [SPARK-25081]Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.
    • Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.
  • June 28, 2018
    • Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
  • June 28, 2018
    • Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.
  • June 7, 2018
    • Fixed a bug affecting Spark SQL execution engine.
    • Improved error handling in Delta Lake.
  • May 17, 2018
    • Improved stability on reading data stored in Azure Data Lake Store.
    • Fixed a bug affecting RDD caching.
    • Fixed a bug affecting Null-safe Equal in Spark SQL.
    • Fixed a bug affecting certain aggregations in streaming queries.
  • Apr 24, 2018
    • Upgraded Azure Data Lake Store SDK from 2.0.11 to 2.2.8 to improve the stability of access to Azure Data Lake Store.
    • Fixed a bug affecting the insertion of overwrites to partitioned Hive tables when spark.databricks.io.hive.fastwriter.enabled is false.
    • Fixed an issue that failed task serialization.
  • Mar 09, 2018
    • Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files.
  • Mar 01, 2018
    • Improved the efficiency of handling streams that can take a long time to stop.
    • Fixed an issue affecting Python autocomplete.
    • Applied Ubuntu security patches.
    • Fixed an issue affecting certain queries using Python UDFs and window functions.
    • Fixed an issue affecting the use of UDFs on a cluster with table access control enabled.
  • Jan 29, 2018
    • Fixed an issue affecting the manipulation of tables stored in Azure Blob Storage.
    • Fixed aggregation after dropDuplicates on empty DataFrame.

Databricks Runtime 3.4 (unsupported)

See Databricks Runtime 3.4.

  • June 7, 2018

    • Fixed a bug affecting Spark SQL execution engine.
    • Improved error handling in Delta Lake.
  • May 17, 2018
    • Improved stability on reading data stored in Azure Data Lake Store.
    • Fixed a bug affecting RDD caching.
    • Fixed a bug affecting Null-safe Equal in Spark SQL.
  • Apr 24, 2018
    • Fixed a bug affecting the insertion of overwrites to partitioned Hive tables when spark.databricks.io.hive.fastwriter.enabled is false.
  • Mar 09, 2018
    • Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files.
  • Dec 13, 2017
    • Fixed an issue affecting UDFs in Scala.
    • Fixed an issue affecting the use of Data Skipping Index on data source tables stored in non-DBFS paths.
  • Dec 07, 2017
    • Improved shuffle stability.