Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean FLOW count data to improve permanent count station accuracy #30

Open
cczhu opened this issue Dec 19, 2019 · 1 comment
Open

Clean FLOW count data to improve permanent count station accuracy #30

cczhu opened this issue Dec 19, 2019 · 1 comment
Assignees
Labels
inputdata miniproject Work self-contained enough to be treated as small projects

Comments

@cczhu
Copy link
Contributor

cczhu commented Dec 19, 2019

Based on some circumstantial evidence, such as

  • The presence of extremely high or low PTC year-on-year growth rates
  • AADTs clearly far too low for their road class (eg. 8540609 has only several thousand vehicles per day, rather than several tens of thousands)
  • A list provided by Arman of permanent stations that reduce the overall model's predictive accuracy

we know that there are issues in the FLOW data being we're ingesting into the model. These issues can both be categorical errors (eg. accidentally including bike counts), sensor failure, special events, or not properly accounting for locations with major construction or detours. We should clean the data of these issues.

This cleaning process should involve:

  • Checking the source code for prj_volume.uoft_centreline_volumes_output to make sure it's only taking in vehicle counts.
  • Creating an automated anomaly detection algorithm using Facebook Prophet to spot sensor failure and special events creating unusual volumes.
  • Manual follow-up of permanent count locations with unusual growth rates to confirm they should be included.
@cczhu cczhu added miniproject Work self-contained enough to be treated as small projects inputdata labels Dec 19, 2019
@cczhu
Copy link
Contributor Author

cczhu commented Mar 11, 2020

Note that right now we use a median to calculate the annual average version D_ijk because PTCs have not been cleaned of high noise locations (we don't for DoM_ijk). get_annually_averaged_ratios should be revisited when this issue is being solved.

@cczhu cczhu mentioned this issue Mar 11, 2020
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inputdata miniproject Work self-contained enough to be treated as small projects
Projects
None yet
Development

No branches or pull requests

2 participants