Optimize (Delta Lake on Azure Databricks)

OPTIMIZE [db_name.]table_name [WHERE predicate]
  [ZORDER BY (col_name1, col_name2, ...)]

Optimize the layout of data stored in DBFS. Optionally optimize a subset of data or colocate data by column. If you do not specify colocation, bin-packing optimization is performed.


  • Bin-packing optimization is idempotent, meaning that if it is run twice on the same dataset, the second instance has no effect.
  • ZOrdering is not idempotent and rearranges all of the data that matches the given filter. Therefore we suggest that you limit it to new data, using partition filters when possible.
Optimize the subset of rows matching the given partition predicate. Only filters involving partition key attributes are supported.
Colocate column information in the same set of files. Co-locality is used by Delta Lake data-skipping algorithms to dramatically reduce the amount of data that needs to be read. You can specify multiple columns for ZORDER BY as a comma-separated list. However, the effectiveness of the locality drops with each additional column.



OPTIMIZE events WHERE date >= '2017-01-01'

WHERE date >= current_timestamp() - INTERVAL 1 day
ZORDER BY (eventType)