Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RODARS pipeline #1103

Open
wants to merge 37 commits into
base: master
Choose a base branch
from
Open

RODARS pipeline #1103

wants to merge 37 commits into from

Conversation

gabrielwol
Copy link
Collaborator

@gabrielwol gabrielwol commented Nov 29, 2024

What this pull request accomplishes:

  • Construction event data (RODARs)!
  • rodars_pull.py daily DAG which runs on Morbius.
    • Pulls RODARs data from ITSC in two parts:
      • Issues: only the latest update (timestamputc) for each issueid.
      • Issue locations: corresponding location details for latest issues.
    • Data is processed using functions in rodars_functions.py, including: unnesting json data, converting binary coordinates to postgres readable format, data type conversion.
    • Finally, data is inserted into congestion_events.rodars_issues and congestion_events.rodars_issue_locations.
      • Notably older issue_locations are deleted using this on insert trigger. I used this method because I thought the number of locations per issue could theoretically change, resulting in orphaned rows, unaffected by an on conflict do update.
  • 5 new lookup tables taken from the documentation in itsc_factors schema: direction, lanesaffectedpattern, locationblocklevel, roadclosuretype_new, roadclosuretype_old. Used to convert codes to human readable format in issue_locations view.
    • Only the itsc_factors.lanesaffectedpattern lookup could use review: I devised the lane_open, lane_closed columns to give a numeric interpretation of how many lanes are open/closed. I used 0.5 for partial closures/slowdowns.
    • Also checkout the function itsc_factors.get_lanesaffected_sums which translates these codes into numeric columns for ease of use in the main view. A lanesaffectedpattern could be something like 'LOLCLCWO' (lane open, lane closed, lane closed, sidwalk open) (Try: SELECT lane_open_auto, lane_closed_auto, lane_open_bike, lane_closed_bike, lane_open_ped, lane_closed_ped, lane_open_bus, lane_closed_bus FROM itsc_factors.get_lanesaffected_sums('LOLCLCWO'))

Issue(s) this solves:

What, in particular, needs to reviewed:

  • Mostly, the final view and the readme. For the processing steps, I double checked things with ITSC portal.

What needs to be done by a sysadmin after this PR is merged

  • Refresh data_scripts on Morbius.

@gabrielwol gabrielwol added the New Data for creating pipelines for new datasets label Nov 29, 2024
@gabrielwol gabrielwol self-assigned this Nov 29, 2024
@gabrielwol gabrielwol linked an issue Nov 29, 2024 that may be closed by this pull request
@gabrielwol gabrielwol marked this pull request as ready for review January 10, 2025 21:54
@gabrielwol
Copy link
Collaborator Author

Ready to review! Will add some more usage examples to readme at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New Data for creating pipelines for new datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore New RODARS data
1 participant