Merge Into (Delta Lake on Azure Databricks)

Merge a set of updates, insertions, and deletions based on a source table into a target Delta table.

Note

This syntax is available in Databricks Runtime 5.1 and above.

MERGE INTO [db_name.]target_table [AS target_alias]
USING [db_name.]source_table [<time travel version>] [AS source_alias]
ON <merge_condition>
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN NOT MATCHED [ AND <condition> ]  THEN <not_matched_action> ]

where

<matched_action>  =
  DELETE  |
  UPDATE SET *  |
  UPDATE SET column1 = value1 [, column2 = value2 ...]

<not_matched_action>  =
  INSERT *  |
  INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])

<time_travel_version>  =
  TIMESTAMP AS OF timestamp_expression |
  VERSION AS OF version
  • There can be 1, 2, or 3 WHEN clauses. Of these, at most 2 can be WHEN MATCHED clauses, and at most 1 can be WHEN NOT MATCHED clause. WHEN NOT MATCHED must be the last clause.
  • WHEN MATCHED clauses:
    • Can be at most one UPDATE action and one DELETE action.
    • Each clause can have an optional condition. However, if there are two clauses, then the first one must have a condition.
    • When there are two clauses and there are conditions (or the lack of) such that a row matches both clauses, then the first clause/action is executed. In other words, the order of the MATCHED clauses matter.
    • If none of the clauses match a source-target row pair that satisfy the merge_condition, then the target rows will not be updated.
    • If UPDATE SET * is present, then it automatically expands to UPDATE SET column1 = source_table.column1 [, column2 = source_table.column2 ...] where column1, column2, etc. are the names of columns present in the target table being updated. Therefore, this syntax assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.
  • WHEN NOT MATCHED clause:
    • Can have only the INSERT action, which can have an optional condition.
    • You must specify a value for every column in your table when you perform an INSERT.
    • If the clause is not present or if it is present but the non-matching source row does not satisfy the condition, then the source row is not inserted.
    • If INSERT * is present, then it automatically expands to INSERT SET (column1 [, column2 ...]) VALUES (source_table.value1 [, source_table.value2 ...]) where column1, column2, etc. are the names of columns present in the target table being updated. Therefore, this syntax assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.

Note

  • Prior to Databricks Runtime 5.1, only the basic syntax of MERGE was supported:

    MERGE INTO [db_name.]target_table [AS target_alias]
    USING [db_name.]source_table [AS source_alias]
    ON <merge_condition>
    WHEN MATCHED THEN
      UPDATE SET column1 = value1 [, column2 = value2 ...]
    WHEN NOT MATCHED THEN
      INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])
    

Examples

You can use MERGE for complex operations like deduplicating data, upserting change data, applying SCD Type 2 operations, etc. See Merge examples for a few examples.