Best Practices for Dropping a Managed Databricks Delta Table

Regardless of how you drop a managed table, it can take a significant amount of time, depending on the data size. Databricks Delta managed tables in particular contain a lot of metadata in the form of transaction logs, and they can contain duplicate data files. If a Delta table has been in use for a long time, it can accumulate a very large amount of data.

In the Azure Databricks environment, there are two ways to drop tables:

  • Run DROP TABLE in a notebook cell.
  • Click Delete in the UI.

Even though you can delete tables in the background without affecting workloads, it is always good to make sure that you run Optimize and Vacuum before you start a drop command on any table. This ensures that the metadata and file sizes are cleaned up before you initiate the actual data deletion.

For example, if you are trying to delete delta.mytable, run the following commands before you start the DROP TABLE command:

  1. Run Optimize: OPTIMIZE delta.`/data/events`
  2. Run Vacuum with an interval of zero: VACUUM events RETAIN 0 HOURS

These two steps reduce the amount of metadata and number of uncommitted files that would otherwise increase the data deletion time.