Optimize (Databricks Delta)

OPTIMIZE [db_name.]table_name [WHERE predicate]
  [ZORDER BY (col_name1, col_name2, ...)]

Optimize the layout of data stored in DBFS. Optionally optimize a subset of data or colocate data by column. If you do not specify colocation, bin-packing optimization is performed.

Note

  • Bin-packing optimization is idempotent, meaning that if it is run twice on the same dataset, the second instance has no effect.
  • ZOrdering is not idempotent and rearranges all of the data that matches the given filter. Therefore we suggest that you limit it to new data, using partition filters when possible.
WHERE
Optimize the subset of rows matching the given partition predicate. Only filters involving partition key attributes are supported.
ZORDER BY
Colocate column information in the same set of files. Co-locality is used by Databricks Delta data-skipping algorithms to dramatically reduce the amount of data that needs to be read. You can specify multiple columns for ZORDER BY as a comma-separated list. However, the effectiveness of the locality drops with each additional column.

Examples

OPTIMIZE events

OPTIMIZE events WHERE date >= '2017-01-01'

OPTIMIZE events
WHERE date >= current_timestamp() - INTERVAL 1 day
ZORDER BY (eventType)