Explainer

Incremental Models

An incremental model processes only the rows that changed since the last run - instead of rebuilding the entire table from scratch every time.

The problem with rebuilding everything

By default, a dbt table model rebuilds itself completely on every run: drop, re-select, re-insert every row. For a 10,000-row dimension that is fine. For a two-billion-row events table, it is a slow, expensive query you run again and again to recompute rows that have not changed since yesterday.

An incremental model breaks that pattern. On the first run it builds the whole table. On every run after that, it processes only the new or changed rows and folds them into the table that already exists.

How an incremental model works

Two pieces make it work: a config that marks the model incremental, and an is_incremental() block that filters down to just the fresh rows.

{{ config(materialized='incremental', unique_key='order_id') }}

select * from {{ source('shop', 'orders') }}

{% if is_incremental() %}
  -- only on incremental runs: keep rows newer than what we already have
  where updated_at > (select max(updated_at) from {{ this }})
{% endif %}

On the first run, is_incremental() is false, the where clause is skipped, and dbt builds the full table. On later runs it is true, so dbt selects only the recent rows - then has to decide how to combine them with the existing table. That decision is the incremental strategy.

The animation below walks through the merge strategy: dbt loads the new rows into a temporary set, then matches them against the target table on the unique_key. Matches are updated in place; everything else is inserted.

Strategies, briefly

merge is the default on most warehouses, but it is not the only option: append simply adds rows (fast, but no updates and no dedup), and delete+insert clears the matching keys first and then inserts. They all answer the same question - what do we do with a row we have seen before? - in different ways.

When to use this pattern

Reach for an incremental model when a table is large and append-mostly - event logs, clickstream, orders - and a full rebuild has become too slow or too expensive. For small tables, a plain table model is simpler and the extra moving parts are not worth it.