Surrogate Keys! #1303

dave-connors-3 · 2022-04-04T18:50:22Z

dave-connors-3
Apr 4, 2022
Maintainer

In a conversation with @mikegfuller and @matt-winkler, we were chatting about the pain of migrating from a SQL Server-style deployment to dbt with a specific eye towards maintaining monotonically increasing integer surrogate keys. While they were useful in the past (better join performance, coordination between loading Kimball-style dims and facts etc), they make your project stateful -- in order to add new data to model A, you need to know about the current state of the data in model A. This goes against the core principal of idempotency, and can lead you into very scary design patterns if you try to deploy this type of setup in dbt.

This article would advocate for derived surrogate keys, the power of the surrogate key dbt util, and to help you start to treat your models like cattle instead of pets (@randypitcherii 's words, not mine)

What is the main problem you are solving? What is your solution?
teaching folks coming from legacy tools how to rethink their assumptions about surrogate keys. The solution would be to derive the keys from the raw data in your warehouse in an idempotent way (dbt_utils.surrogate_key)
Why should the reader care about this problem? Why is your solution the right one? This should help form your specific target audience.
Trying to build surrogate keys the Old Way in dbt is a real pain, and does not actually help anyone. The target audience here are folks trying to do a 1:1 migration of Old Data to New Data -- now is the time to rethink your assumptions!

gwenwindflower · 2022-04-04T22:15:17Z

gwenwindflower
Apr 4, 2022

this is a humdinger! so practical, and yet so much of the analytics engineer mindset is baked in -- we must do this one!

0 replies

dave-connors-3 · 2022-04-15T14:48:00Z

dave-connors-3
Apr 15, 2022
Maintainer Author

@gwenwindflower @mikegfuller and I have put together an initial outline for this article! Let us know if there's anything else we need to do before we clear this for drafting!

3 replies

gwenwindflower Apr 18, 2022

🐝utiful -- i'll give this a review tomorrow morning 🙏🏻

dave-connors-3 Apr 27, 2022
Maintainer Author

hey @gwenwindflower ! anything else I need to do here before @mikegfuller and start putting our pencils together?

jasnonaz Apr 28, 2022
Maintainer

I think we're good to get an issue open and get writing - thanks @dave-connors-3!

dave-connors-3 · 2022-04-15T15:04:42Z

dave-connors-3
Apr 15, 2022
Maintainer Author

2 replies

matt-winkler Apr 15, 2022
Collaborator

Good stuff. One related comment to this is that selecting the max in a subquery will cause a full scan on some warehouses (I think Bigquery to be specific), so there are performance / cost considerations that come into play with this pattern as well!

dave-connors-3 Apr 15, 2022
Maintainer Author

@matt-winkler -- my apologies, but this was commented in the wrong discussion! moving over to #1334

b-per · 2022-04-18T16:49:31Z

b-per
Apr 18, 2022
Collaborator

Looking forward to seeing a draft!

I looked at the Figjam and couldn't see if the article was going to cover the use case where people are migrating from another solution to dbt and the current incremental keys are already used by downstream systems. (so we can't replace the existing ones by hashed keys)
I think it would be worth talking about this example and how we would handle it.

0 replies

gwenwindflower · 2022-06-02T03:45:10Z

gwenwindflower
Jun 2, 2022

@dave-connors-3 did this ever get moved to an Issue or did it get lost in the machine? happy to resync on next steps if you still want to write this.

1 reply

dave-connors-3 Jun 2, 2022
Maintainer Author

ya know, I thought i made an issue but i ain't seeing one -- we do have a draft cookin, but i can make sure to create an issue. Would honestly appreciate a sync on next steps either way! cc: @mikegfuller

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surrogate Keys! #1303

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Surrogate Keys! #1303

dave-connors-3 Apr 4, 2022 Maintainer

Replies: 5 comments · 6 replies

gwenwindflower Apr 4, 2022

dave-connors-3 Apr 15, 2022 Maintainer Author

gwenwindflower Apr 18, 2022

dave-connors-3 Apr 27, 2022 Maintainer Author

jasnonaz Apr 28, 2022 Maintainer

dave-connors-3 Apr 15, 2022 Maintainer Author

matt-winkler Apr 15, 2022 Collaborator

dave-connors-3 Apr 15, 2022 Maintainer Author

b-per Apr 18, 2022 Collaborator

gwenwindflower Jun 2, 2022

dave-connors-3 Jun 2, 2022 Maintainer Author

dave-connors-3
Apr 4, 2022
Maintainer

Replies: 5 comments 6 replies

gwenwindflower
Apr 4, 2022

dave-connors-3
Apr 15, 2022
Maintainer Author

dave-connors-3 Apr 27, 2022
Maintainer Author

jasnonaz Apr 28, 2022
Maintainer

dave-connors-3
Apr 15, 2022
Maintainer Author

matt-winkler Apr 15, 2022
Collaborator

dave-connors-3 Apr 15, 2022
Maintainer Author

b-per
Apr 18, 2022
Collaborator

gwenwindflower
Jun 2, 2022

dave-connors-3 Jun 2, 2022
Maintainer Author