How to - use incremental models to detect regressions in historic metrics #1615

graciegoheen · 2022-06-22T20:11:38Z

graciegoheen
Jun 22, 2022
Maintainer

Background

Imagine that you have expected outputs for a historic metric (total revenue) as described below:

Year	Total Revenue
2019	1 million
2020	1.5 million
2021	2 million

How can you test that these historical metrics do not change in dbt?

This is an alternative solution to the snapshot-based one proposed here.

Step 1

Let's say that you have a fct_orders table, which has one row for each order:

order_id	order_date	amount
1	2019-01-05	10
2	2019-02-06	50
3	2020-02-07	8
...	...	...

First, you should create a model that sums the amount for each year excluding the current one (for simplicity, we're assuming you have no costs). Let's call this fct_revenue_summary:

select
        year(order_date) as year,
        sum(amount) as total_revenue
from {{ ref('fct_orders') }}
where year <> year(current_timestamp())
group by 1

Step 2

Next, create an incremental model on top of fct_revenue_summary which captures the historical view of revenue outputs. Let's call this fct_revenue_summary_history:

{{
    config(
        materialized='incremental',
        unique_key=['year', 'total_revenue']
    )
}}

select
        year,
        total_revenue
from {{ ref('fct_revenue_summary') }}

{% if is_incremental() %}
where true
{% endif %}

Step 3

Finally, create a test on fct_revenue_summary_history to check that each year has a single source of truth for total_revenue:

version: 2

models:
  - name: fct_revenue_summary_history
    columns:
      - name: year
        tests:
          - unique

When you run a dbt build, you will get an error if you ever output a new total_revenue value for a historic year that differs from the original. For example, if your original fct_revenue_summary_history looks like this:

year	total_revenue
2019	1 million
2020	1.5 million
2021	2 million

But then, you introduce a breaking change such that dbt now calculates the total_revenue for 2019 as 0.8 million, fct_revenue_summary_history will now look like this:

year	total_revenue
2019	1 million
2019	0.8 million
2020	1.5 million
2021	2 million

And the uniqueness test on the year column will fail.

This allows you to detect regressions in historic metrics!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to - use incremental models to detect regressions in historic metrics #1615

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How to - use incremental models to detect regressions in historic metrics #1615

graciegoheen Jun 22, 2022 Maintainer

Background

Step 1

Step 2

Step 3

Replies: 0 comments

graciegoheen
Jun 22, 2022
Maintainer