Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create open_data athena db #691

Conversation

wrridgeway
Copy link
Member

@wrridgeway wrridgeway commented Dec 26, 2024

⚠️ URLs pointing to the old parcel universe asset in exposures, the wiki, and the open data portal stories need to be updated before this PR is merged. ⚠️

This PR creates a new athen db, open_data. This db will feed all of our open data assets that use the API for updating - this only excludes ccao.commercial_data. The main goal here is to make sure all of the columns available through the API for a given open data asset match the columns in the athena asset that feeds it. row_id columns are now constructed in the view that feeds an open data asset rather than during uploads through the API.

There are a lot of discrepancies between our current athena views and what is making it onto the open data portal. Some of those are by design, but clearly some are a product of the athena views having grown while the open data sets didn't keep up. I think it's worth opening another PR after this in order to try and bring the open data assets into parity with the views that (ultimately) feed them.

@wrridgeway wrridgeway self-assigned this Dec 26, 2024
@wrridgeway wrridgeway linked an issue Dec 26, 2024 that may be closed by this pull request
Copy link
Member Author

@wrridgeway wrridgeway Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ripped row id construction out of this script since row id will already be in every open data view.

@wrridgeway wrridgeway changed the title Create open data athena db Create open_data athena db Jan 14, 2025
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding both new assets and renaming the parcel universe assets here.

@@ -128,7 +128,7 @@ View containing aggregate land square footage for all PINs.
View containing building permits organized by PIN, with extra metadata
recorded by CCAO permit specialists during the permit processing workflow.

**Primary Key**: `pin`, `date_issued`
**Primary Key**: `pin`, `permit_number`
Copy link
Member Author

@wrridgeway wrridgeway Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This combo uniquely identifies rows and feels more intuitive to me:

select count(*) from default.vw_pin_permit group by pin, permit_number having count(*) > 1

Yields zero rows. I also needed to make sure that none of the columns I used for Socrata PKs had any NULL values, and this combo suits that condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Praise] Yup, this new version is correct. Thanks!

Copy link
Member Author

@wrridgeway wrridgeway Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just a lift and shift from the default folder, but also added the three new assets and changed refs to point to the open_data db.

@@ -113,6 +97,8 @@ def build_query(
.strip("]")
.split(",")
)
# row id won't show up here since it's hidden on the open data portal assets
asset_columns += ["row_id"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

row_id is hidden by the API, but still necessary to include.

@wrridgeway wrridgeway marked this pull request as ready for review January 14, 2025 19:13
@wrridgeway wrridgeway requested a review from a team as a code owner January 14, 2025 19:13
Copy link
Contributor

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

@@ -128,7 +128,7 @@ View containing aggregate land square footage for all PINs.
View containing building permits organized by PIN, with extra metadata
recorded by CCAO permit specialists during the permit processing workflow.

**Primary Key**: `pin`, `date_issued`
**Primary Key**: `pin`, `permit_number`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Praise] Yup, this new version is correct. Thanks!

dbt/models/open_data/open_data.vw_assessed_value.sql Outdated Show resolved Hide resolved
@wrridgeway wrridgeway merged commit 4a1dc4f into master Jan 14, 2025
10 checks passed
@wrridgeway wrridgeway deleted the 652-create-current-year-only-parcel-universe-asset-on-open-data branch January 14, 2025 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create current-year only Parcel Universe asset on Open Data
2 participants