-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create open_data
athena db
#691
Create open_data
athena db
#691
Conversation
…se-asset-on-open-data
…se-asset-on-open-data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ripped row id construction out of this script since row id will already be in every open data view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding both new assets and renaming the parcel universe assets here.
@@ -128,7 +128,7 @@ View containing aggregate land square footage for all PINs. | |||
View containing building permits organized by PIN, with extra metadata | |||
recorded by CCAO permit specialists during the permit processing workflow. | |||
|
|||
**Primary Key**: `pin`, `date_issued` | |||
**Primary Key**: `pin`, `permit_number` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This combo uniquely identifies rows and feels more intuitive to me:
select count(*) from default.vw_pin_permit group by pin, permit_number having count(*) > 1
Yields zero rows. I also needed to make sure that none of the columns I used for Socrata PKs had any NULL values, and this combo suits that condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Praise] Yup, this new version is correct. Thanks!
dbt/models/open_data/exposures.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just a lift and shift from the default
folder, but also added the three new assets and changed refs to point to the open_data
db.
@@ -113,6 +97,8 @@ def build_query( | |||
.strip("]") | |||
.split(",") | |||
) | |||
# row id won't show up here since it's hidden on the open data portal assets | |||
asset_columns += ["row_id"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
row_id
is hidden by the API, but still necessary to include.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
@@ -128,7 +128,7 @@ View containing aggregate land square footage for all PINs. | |||
View containing building permits organized by PIN, with extra metadata | |||
recorded by CCAO permit specialists during the permit processing workflow. | |||
|
|||
**Primary Key**: `pin`, `date_issued` | |||
**Primary Key**: `pin`, `permit_number` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Praise] Yup, this new version is correct. Thanks!
This PR creates a new athen db,
open_data
. This db will feed all of our open data assets that use the API for updating - this only excludesccao.commercial_data
. The main goal here is to make sure all of the columns available through the API for a given open data asset match the columns in the athena asset that feeds it.row_id
columns are now constructed in the view that feeds an open data asset rather than during uploads through the API.There are a lot of discrepancies between our current athena views and what is making it onto the open data portal. Some of those are by design, but clearly some are a product of the athena views having grown while the open data sets didn't keep up. I think it's worth opening another PR after this in order to try and bring the open data assets into parity with the views that (ultimately) feed them.