Vacuum

Clean up files associated with a table. There are different versions of this command for Spark and Delta tables.

Vacuum a Spark table

VACUUM ([db_name.]table_name|path) [RETAIN num HOURS]
RETAIN num HOURS
The retention threshold.

Recursively vacuum directories associated with the Spark table and remove uncommitted files older than a retention threshold. The default threshold is 7 days. DBIO automatically triggers VACUUM operations as data is written. See Clean up uncommitted files for more information.

Vacuum a Delta table

VACUUM [db_name.]table_name|path [RETAIN num HOURS] [DRY RUN]

Recursively vacuum directories associated with the Delta table and remove files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. The default threshold is 7 days. VACUUM operations on Delta tables are not triggered automatically. See Garbage collection for more information.

If you run VACUUM on your Delta table, you may lose the ability time travel back to a version older than the default 7 day data retention period.

RETAIN num HOURS
The retention threshold.
DRY RUN
Return a list of files to be deleted.