Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Btdag #381

Open
wants to merge 90 commits into
base: master
Choose a base branch
from
Open

Btdag #381

Show file tree
Hide file tree
Changes from 79 commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
c4e3c05
#343 testing first DAG
Jan 8, 2021
534880f
#343 test run with new date
Jan 8, 2021
956bb45
#343 test run reader_history table
Jan 21, 2021
e9b24aa
343 test run reader history table again
Jan 22, 2021
d6e9700
#250 test dag for broken readers
Jan 26, 2021
bed366a
#250 test dag corrected
Jan 27, 2021
18d2e05
250 test dag updated again
Jan 27, 2021
39a0e85
250 test dag updated test
Jan 27, 2021
ca10bed
#250 test dag updated again
Jan 27, 2021
f1ce6b6
#317 readme file updated
Feb 5, 2021
8af7e0b
#343 blip pipeline fail check all in one DAG
Feb 19, 2021
d090db1
#343 blip pipeline fail check all in one DAG start date updated
Feb 19, 2021
87e2bf5
#381 SQL add in create_tables folder
Feb 26, 2021
03519de
#343 SQL for functions used in airflow added in functions folder
Feb 26, 2021
f89ef79
#381 Readme file updated for creating new routes after Cathy's comments
Mar 26, 2021
fffa6a8
#381 Readme file updated for creating new routes
Mar 26, 2021
eeb6b55
#326 Readme file updated for new tables after Raphael's comments
Apr 16, 2021
2f9209d
#343 blip pipeline fail check slack message in new format
Apr 16, 2021
21154c8
#343 blip pipeline fail check syntax corrected
Apr 23, 2021
df569f0
#343 blip pipeline check syntax update
Apr 23, 2021
9f0bbc3
#343 blip pipeline fail check syntax updated
Apr 30, 2021
98afd58
#343 blip pipeline connection corrected
Apr 30, 2021
9e85abf
#343 blip pipeline start_end corrected
Apr 30, 2021
4a2fb83
#343 blip pipeline ds corrected for yesterday
Apr 30, 2021
86bbc88
#343 blip pipeline ds corrected for yesterday
Apr 30, 2021
1b4848c
#343 blip pipeline ds corrected
Apr 30, 2021
10ae474
#343 blip pipeline ds updated
Apr 30, 2021
2547288
#343 blip pipeline updated
Apr 30, 2021
9a942bc
#343 blip pipeline syntax error corrected
May 7, 2021
ce236f9
#343 blip pipeline functions schema access updated
May 7, 2021
c016dbb
#343 blip pipeline functions updated
May 7, 2021
dc5771e
#343 blip pipeline ds updated with spaces
May 7, 2021
5dad382
#343 blip pipeline ds python operator updad
May 7, 2021
59649fe
#343 blip pipeline ds python operator updated
May 7, 2021
236282f
#343 blip pipeline ds python operator updated
May 7, 2021
2ec625f
#343 blip pipeline ds python operator updated
May 7, 2021
871df9f
#343 blip pipeline ds python operator updated
May 7, 2021
71849f3
#343 blip pipeline slack channel updated
May 14, 2021
7fa08df
#343 blip pipeline date in python operator fixed
May 14, 2021
f35e68d
#343 blip pipeline date in syntax error fixed
May 14, 2021
b247cb1
#343 blip pipeline syntax error fixed
May 14, 2021
7150d10
#343 blip pipeline syntax error fixed
May 14, 2021
b9f6b99
#343 blip pipeline quick fix for pipeline check
May 14, 2021
345b23b
#263 Bluetooth readme.md updated with lookup table addition
May 21, 2021
fdf7a9a
#317 Bluetooth update readme.md updated with liinks
May 21, 2021
1ccd39d
#343 empty list for broken reader test condition resolved
May 21, 2021
468c100
#250 functions changed to use bluetooth schema instead of personal s…
Jun 15, 2021
7ddb0b7
the detector_history_corrected table explaineaiained
Jun 15, 2021
3d96a7f
Offline bluetooth readers past one week.
webgisgeek Jun 18, 2021
508c0dd
#381 DAG name corrected
Jun 23, 2021
2ffa367
#343 DAG name corrected and PR feedback incorporated
Jul 5, 2021
7ce942d
Merge branch 'btdag' of https://github.com/CityofToronto/bdit_data-so…
Jul 5, 2021
55d5abb
Delete bt_read_history.py
webgisgeek Jul 12, 2021
c7dafa4
Delete bt_test_dag.py
webgisgeek Jul 12, 2021
3fdcb36
Delete check_brokenreaders.py
webgisgeek Jul 12, 2021
ee3a698
Updated broken_readers.sql
webgisgeek Jul 29, 2021
d40d1bf
Update reader_status_history.sql
webgisgeek Jul 29, 2021
c62a421
Update README.md
webgisgeek Jul 29, 2021
0db67ae
Update bluetooth_check_readers.py
webgisgeek Jul 29, 2021
5cc7bd1
#450 Update slack alerts to use variable
chmnata Feb 16, 2022
e0c27bd
#343 bugfix
radumas Feb 23, 2022
4f321c3
#381, #250 address comment for reader_status_history sql
chmnata Mar 21, 2022
9e321fb
#381, #250 update column name
chmnata Mar 22, 2022
4164e77
#385, #343 Simplify broken_readers.sql and add comments
chmnata Mar 22, 2022
dc763b6
#385 Add missing insert_report_date function
chmnata Mar 22, 2022
a7216eb
#343 Update insert report date sql used in the dag
chmnata Mar 23, 2022
a74c4dd
#343 Update pipeline check to use sqlcheckoperator and add comments
chmnata Mar 23, 2022
bd98f89
#385 update broken readers sql
chmnata Mar 23, 2022
e341086
#381, #343 Update broken reader task to alert and not kill self
chmnata Mar 24, 2022
e3647aa
#381, #317 update segment table to include length column
chmnata Mar 25, 2022
385a7f3
#131 Update blip api to use psycopg2 instead of pg
chmnata Mar 28, 2022
576b29d
#131 Updpg json encoder to use json dump
chmnata Mar 28, 2022
afacbcc
#131update comments from pg to psycopg2
chmnata Mar 29, 2022
c7cf1df
#131 update upsert sql
chmnata Mar 29, 2022
d36423a
#131fix that json mess in prepping upsert row
chmnata Mar 29, 2022
f06d4b9
#131 fix upsert sql for column name and got rid of included column
chmnata Mar 29, 2022
3353e20
#131 change from tuple to index for formatting analyses_pull_data
chmnata Mar 29, 2022
40b3ce0
#131 got rid of print() comments
chmnata Mar 29, 2022
65c4a6f
#131 fix upsert returned values to use inserting row
chmnata Mar 30, 2022
703dcab
#381 edit update readme
chmnata Mar 30, 2022
5d4d620
#381 change owner to bt_admins
chmnata Mar 30, 2022
342f739
#381 add st_linesubstring steps to readme
chmnata Mar 30, 2022
66a8c7b
#381 update create new segment to not use ST_reverse for geom
chmnata Mar 31, 2022
9b1a9c8
#381 update reader status history to use correct version of table
chmnata Apr 5, 2022
d034dfd
#381 move updating readme to main readme
chmnata Apr 5, 2022
277996a
#381 update reader_status_history function
chmnata Apr 6, 2022
5f40600
#381 update bt new segments table with new columns
chmnata Apr 7, 2022
6b6f826
#343 update check to not use case when
chmnata Apr 13, 2022
9c29252
#343 Update bt dag pipeline check task to be idempotent
chmnata May 6, 2022
7d1a558
#250 update function
chmnata Jun 6, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 124 additions & 27 deletions bluetooth/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,39 +2,66 @@

## Table of Contents

- [Table of Contents](#table-of-contents)
- [1. Overview](#1-overview)
- [2. Table Structure](#2-table-structure)
- [Bluetooth - Bliptrack](#bluetooth---bliptrack)
- [Table of Contents](#table-of-contents)
- [1. Overview](#1-overview)
- [2. Table Structure](#2-table-structure)
- [Open Data Tables](#open-data-tables)
- [Live Feed](#live-feed)
- [Historical Data](#historical-data)
- [Geography](#geography)
- [Live Feed](#live-feed)
- [Historical Data](#historical-data)
- [Geography](#geography)
- [Internal Tables](#internal-tables)
- [Observations](#observations)
- [Filtering devices](#filtering-devices)
- [all_analyses](#all_analyses)
- [ClassOfDevice](#classofdevice)
- [3. Technology](#3-technology)
- [4. Bliptrack UI](#4-bliptrack-ui)
- [Observations](#observations)
- [Filtering devices](#filtering-devices)
- [all_analyses](#all_analyses)
- [reader_history](#reader_history)
- [reader_locations](#reader_locations)
- [routes](#routes)
- [reader_status_history](#reader_status_history)
- [ClassOfDevice](#classofdevice)
- [3. Technology](#3-technology)
- [4. Bliptrack UI](#4-bliptrack-ui)
- [Accessing Bliptrack](#accessing-bliptrack)
- [Terms](#terms)
- [Downloading travel time data](#downloading-travel-time-data)
- [Common Issues](#common-issues)
- [5. Bliptrack API](#5-bliptrack-api)
- [Terms](#terms)
- [Downloading travel time data](#downloading-travel-time-data)
- [Common Issues](#common-issues)
- [5. Bliptrack API](#5-bliptrack-api)
- [Pulling travel time data](#pulling-travel-time-data)
- [Under the Hood](#under-the-hood)
- [The `analysisId`](#the-analysisid)
- [6. Bliptrack API OD Data](#6-bliptrack-api-od-data)
- [Under the Hood](#under-the-hood)
- [The `analysisId`](#the-analysisid)
- [6. Bliptrack API OD Data](#6-bliptrack-api-od-data)
- [Start-End Data](#start-end-data)
- [Some notes on `measuredTime` and records](#some-notes-on-measuredtime-and-records)
- [Dictionary Structure](#dictionary-structure)
- [Some notes on `measuredTime` and records](#some-notes-on-measuredtime-and-records)
- [Dictionary Structure](#dictionary-structure)
- [Others Data](#others-data)
- [deviceClass and outlierLevel](#deviceclass-and-outlierlevel)
- [For the Start-End Data](#for-the-start-end-data)
- [For the Others Data](#for-the-others-data)
- [outliersLevel](#outlierslevel)
- [7. Adding New Segments to the Database](#7-adding-new-segments-to-the-database)
- [8. Open Data Releases](#8-open-data-releases)
- [For the Start-End Data](#for-the-start-end-data)
- [For the Others Data](#for-the-others-data)
- [outliersLevel](#outlierslevel)
- [7. Adding New Segments to the Database](#7-adding-new-segments-to-the-database)
- [8. Open Data Releases](#8-open-data-releases)

- [9. Technology](#3-technology)
- [10. Bliptrack UI](#4-bliptrack-ui)
- [Accessing Bliptrack](#accessing-bliptrack)
- [Terms](#terms)
- [Downloading travel time data](#downloading-travel-time-data)
- [Common Issues](#common-issues)
- [11. Bliptrack API](#5-bliptrack-api)
- [Pulling travel time data](#pulling-travel-time-data)
- [Under the Hood](#under-the-hood)
- [The `analysisId`](#the-analysisid)
- [12. Bliptrack API OD Data](#6-bliptrack-api-od-data)
- [Start-End Data](#start-end-data)
- [Some notes on `measuredTime` and records](#some-notes-on-measuredtime-and-records)
- [Dictionary Structure](#dictionary-structure)
- [Others Data](#others-data)
- [deviceClass and outlierLevel](#deviceclass-and-outlierlevel)
- [For the Start-End Data](#for-the-start-end-data)
- [For the Others Data](#for-the-others-data)
- [outliersLevel](#outlierslevel)
- [13. Adding New Segments to the Database](#7-adding-new-segments-to-the-database)
- [14. Open Data Releases](#8-open-data-releases)

## 1. Overview

Expand Down Expand Up @@ -134,6 +161,72 @@ The script pulls the route configurations nightly from the Blip server. These ar
|pull_data|boolean| (defaults to false) whether the script should pull observations |
`outcomes` are set for different routes for purposes like: filtering BT and WiFi, or tracking Origin Destination points.


#### reader_history

This is a table of bluetooth readers that have been installed at each locations at different times in the past. Readers that have been installed at any point of time in the past are listed in this table irrespective of whether the reader is still physically present at the installed location or not. This table is the sum-total of all the readers irrespective of their current status.


`reader_history` table contains the following fields:

|Column|Type|Notes|
|------|----|-----|
|`reader_history_id`|integer| Unique ID for each reader|
|`reader_id`|varchar|foreign key to `reader_locations`. This is a location id. Multiple readers could have been installed in a same location at different times.|
|`serial_no_bluetooth`|integer|This is a four digit number that is assigned to each bluetooth reader. This serial number corresponds to the **Zone** in the bliptrack table.|
|`serial_no_wifi`|integer|Some readers have both wifi and bluetooth sensors. For those readers which has the wifi sensor its serial number is populated.|
|`date_installed`|date|Date the reader is installed at the location|
|`date_uninstalled`|date|Date the reader is uninstalled at the location|

Except the `reader_history_id` all other fields in this table has to be updated manually.

#### reader_locations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detector_history table is missing in this doc


This is a table of all locations at which Bliptrack readers have been installed and are physically existing. The installed readers could be online or offline but has NOT been removed physically. Each intersection has only **one** reader that is assigned to a route/routes. Therefore, if there are more than one readers in a locations that have not been removed, such detectors are listed in the `reader_history` table. The function [`sql/functions/reader_status_history`](sql/functions/) updates the field `date_last_received`. This table consists of the following fields:
|Column|Type|Notes|
|------|----|-----|
|`reader_id`|integer|Unique ID for a unique reader that corresponds to the `reader_id` in the `reader_history` table.|
|`name`|varchar|Name of location consisting of two characters for E/W street, two characters for N/S street. for example QU_DF for Queen st and Dufferin St. Whatever name is already existing has been retained for example, A, B, C or Beechwood, Castlefield etc has been retained)|
|`int_id`|integer|Centreline intersection id for closest intersection or pseudo intersections in case of an expressway. A logical location closest to a reader that would be an intersection (Node)|
|`date_active`|date|The date this reader was installed.|
|`date_inactive`|date|NULL unless this field is updated manually to reflect the date when this reader is deemed inactive.|
|`date_last_received`|timestamp without time zone|Latest date when the `aggr_5min` table has data aggregated for this reader. This field is updated daily by the function [`sql/functions/reader_status_history`](sql/functions/).|
|`project_name`|varchar|Name of the project by which the detector is installed.
|`geom`|geometry| Geometry of the location.|


#### routes

This is a table of all the routes that pass through the locations (which are either intersections or pseudo intersections) where readers are installed. It corresponds to a unique segments on which data is collected from the network of readers. For a two way street, routes are created for both directions such as Eastbound (EB) - Westbound (WB) or Northbound (NB) - Southbound (SB). In the City of Toronto, bluetooth readers are installed at various locations at different times by different projects. Thus, new routes were created accordingly as more readers are added in new locations. Easy way to create and update routes is [described here](https://github.com/CityofToronto/bdit_data-sources/blob/btdag/bluetooth/update/README.md).

The `routes` table has the following fields:

|Column|Type|Notes|
|------|----|-----|
|`analysis_id`|bigint|analysis_id from the `bluetooth.all_analyses` table. For new routes that are added lately, new analysis_id starting from 1600000 is assigned. _`all_analyses` table has to be updated to include these new routes for data aggregation_.|
|`name`|varchar|name of the route. This generally contains a detail name explaining the route start and end points. For example, `DVP-J to DVP-I` is a route along Don Valley Parkway between detector **J** and **I**.|
|`start_street_name`|varchar|This is the name of the street along which the route is created at the start point of the route.|
|`start_cross_street`|varchar|The street that crosses the start street at the start point of the route.|
|`start_reader_id`|varchar|Corresponding reader_id from the reader_locations table at the start point of the route.|
|`end_street_name`|varchar|At times the route can start and end at different street name thus the name of the street along the route where the route ends.|
|`end_cross_street`|varchar|This is the name of the street where the route ends.|
|`end_reader_id`|varchar| Corresponding reader_id from the reader_locations table.|
|`date_active`|date|The date when the `aggr_5min` table started aggregating data from this reader.|
|`date_inactive`|date|The date when the reader stopped sending the readings. This field has to be updated manually as either one or both of the readers in a route may temporarily stop aggregating data for few days and come back again.|
|`date_last_received`|date|Last day data on the route is aggregated. This field is updated everyday by the function [`sql/functions/insert_report_date`](sql/functions/). For the routes that are active, the last reported date will be yesterday.|
|`geom`|geometry||

#### reader_status_history

This is a table that logs the `last_active_date` for each reader daily. This table is used as a lookup table to identify the readers from which data aggregation did not occur aka `broken_readers` as of yesterday. This table is updated by the function [`sql/functions/reader_status_history`](sql/functions/). The function runs daily in `Airflow`. The function [`sql/functions/broken_readers`](sql/functions/) depends on this look up table to identify the readers that were `not active` yesterday but were `active` the day before as `broken_readers`. It has the following four fields.

|Column|Type|Notes|
|------|----|-----|
|`reader_id`|integer|The reader_id unique for readers in each location.|
|`last_active_date`|date|The latest date when the data from the reader was aggregated.|
|`active`|boolean|Boolean true or false. If the data_aggregation for the reader occured yesterday, this field is true. Else false. |
|`dt`|date|This is the same date that is used as the parameter in the function to update the table. As the function runs daily in airflow, this field contains the most recent date.|

#### ClassOfDevice

|Column|Type|Notes|
Expand Down Expand Up @@ -167,6 +260,7 @@ substring(cod::bit(24) from 17 for 6) as minor_device_class,
substring(cod::bit(24) from 12 for 5) as major_device_class
```


## 3. Technology

The BlipTrack sensors have two directional Bluetooth antennas and an omnidirectional WiFi antenna.
Expand Down Expand Up @@ -228,7 +322,9 @@ The script pulls a day of data for each [analysisID](#the-analysisid) and upload
Two companion scripts send alerts after this script runs:

- [notify_routes.py](api/notify_routes.py) sends an email if new route configurations appear in the database.
- [brokenreaders.py](readersdown/) sends an email if a sensor stopped producing data the previous day.
- [bluetooth_check_readers.py](../dags/bluetooth_check_readers.py) This script runs at 08:00 hrs everyday and sends a slack message in `data_talk` channel if the data pipeline fails as of the day before. If the data pipeline is ok, then the script checks if there are any bluetooth readers which is not sending the data. In case of an occurance of a broken reader, it will then log the list of broken readers in `broken_readers_log` table and also sends a slack message. This script also updates the `date_last_received` field in the `routes` table and `reader_locations` table and add new rows to the `reader_status_history` table.



#### Under the Hood

Expand Down Expand Up @@ -325,3 +421,4 @@ For the [King St. Transit Pilot](toronto.ca/kingstreetpilot), the team has relea

- [King St. Transit Pilot - Detailed Bluetooth Travel Time](https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/#739f4e47-737c-1b32-3a0b-45f80e8c2951) contains travel times collected during the King Street Pilot in the same format as the 5-min data set. [Here](sql\analysis\open_data_ksp_travel_times.sql) is the SQL code producing these data.
- [King St. Transit Pilot – Bluetooth Travel Time Summary](https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/#a85f193a-4910-f155-6cb9-49f9dedd1392) contains monthly averages of corridor-level travel times by time periods. [Here](sql\analysis\open_data_ksp_agg_travel_times.sql) is the SQL code producing these summaries.

Loading