Surrogate Keys! #1303
Replies: 5 comments 6 replies
-
this is a humdinger! so practical, and yet so much of the analytics engineer mindset is baked in -- we must do this one! |
Beta Was this translation helpful? Give feedback.
-
@gwenwindflower @mikegfuller and I have put together an initial outline for this article! Let us know if there's anything else we need to do before we clear this for drafting! |
Beta Was this translation helpful? Give feedback.
-
Looking forward to seeing a draft! I looked at the Figjam and couldn't see if the article was going to cover the use case where people are migrating from another solution to dbt and the current incremental keys are already used by downstream systems. (so we can't replace the existing ones by hashed keys) |
Beta Was this translation helpful? Give feedback.
-
@dave-connors-3 did this ever get moved to an Issue or did it get lost in the machine? happy to resync on next steps if you still want to write this. |
Beta Was this translation helpful? Give feedback.
-
In a conversation with @mikegfuller and @matt-winkler, we were chatting about the pain of migrating from a SQL Server-style deployment to dbt with a specific eye towards maintaining monotonically increasing integer surrogate keys. While they were useful in the past (better join performance, coordination between loading Kimball-style dims and facts etc), they make your project stateful -- in order to add new data to model A, you need to know about the current state of the data in model A. This goes against the core principal of idempotency, and can lead you into very scary design patterns if you try to deploy this type of setup in dbt.
This article would advocate for derived surrogate keys, the power of the surrogate key dbt util, and to help you start to treat your models like cattle instead of pets (@randypitcherii 's words, not mine)
What is the main problem you are solving? What is your solution?
teaching folks coming from legacy tools how to rethink their assumptions about surrogate keys. The solution would be to derive the keys from the raw data in your warehouse in an idempotent way (dbt_utils.surrogate_key)
Why should the reader care about this problem? Why is your solution the right one? This should help form your specific target audience.
Trying to build surrogate keys the Old Way in dbt is a real pain, and does not actually help anyone. The target audience here are folks trying to do a 1:1 migration of Old Data to New Data -- now is the time to rethink your assumptions!
Beta Was this translation helpful? Give feedback.
All reactions