Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Btdag #381

Open
wants to merge 90 commits into
base: master
Choose a base branch
from
Open

Btdag #381

wants to merge 90 commits into from

Conversation

webgisgeek
Copy link

What this pull request accomplishes:

  • review and merge changes

Issue(s) this solves:

  • documentation on creating new bt routes

What, in particular, needs to reviewed:

Copy link
Member

@radumas radumas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions on the DAGs, also wondering where the sql is.


def bad_readers(con, run_date):
with con.cursor() as cursor:
select_query = """SELECT reader from mohan.broken_reader()"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the date be incorporate as a parameter to this function?
Also the sql for this function isn't in this branch

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot about SQL altogether. Will add it to the branch too. Yes date should be a parameter. Will change that

logging.basicConfig(level=logging.DEBUG)

#To connect to pgadmin
CONFIG = configparser.ConfigParser()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should set up a bt bot account for Airflow so then the credentials would be imported
https://github.com/CityofToronto/bdit_data-sources/blob/master/dags/pull_wys.py#L59-L60

Comment on lines 11 to 14
import configparser

LOGGER = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Airflow is already taking care of the logging configuration, the code should simply call logging.info, etc, further down.

https://stackoverflow.com/a/41841689/4047679

Comment on lines 39 to 43
:red_circle: A bt Reader may be broken!
*Task*: {task}
*Dag*: {dag}
*Execution Time*: {exec_date}
*Log Url*: {log_url}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be updated to our more recent Slack message style.

See for example

https://github.com/CityofToronto/bdit_data-sources/blob/master/dags/pull_wys.py#L28-L44


## Postgres update bluetooth reader_status_history DAG
# Task to update the reader_status_history table daily
bt_status_history = PostgresOperator(sql='''SELECT * from mohan.reader_status_history(current_date)''',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't going to work because current_date isn't a template parameter.

See for example https://github.com/CityofToronto/bdit_data-sources/blob/master/dags/refresh_wys_monthly.py#L72 and our documentation "How to use {{ ds }}"

In short it should just be {{ ds }} here

@radumas radumas requested review from chmnata and KatiRG February 26, 2021 03:59
@radumas
Copy link
Member

radumas commented Feb 26, 2021

@KatiRG @chmnata if you could review the route updating documentation here that would be great.

KatiRG pushed a commit that referenced this pull request Mar 3, 2021
@KatiRG
Copy link
Contributor

KatiRG commented Mar 3, 2021

@webgisgeek Hi Mohan, the readme is pretty clear, I edited a few minor things but there were some things that I didn't understand:

  • In this update, there exists an Excel Sheet template that contains details of newly added bluetooth detectors (update/img/template.PNG). The details include proposed route name, description, intersection name (BDIT convention), sensor id and lat/lon at start point, and sensor id and lat/lon at the end points along with numerous other fields.
    => there is no sensor id column in the Start point section

  • The routes that were updated by adding this batch of new detectors were named with a prefix "DT3"._
    => screenshot does not show the new detectors DT3_

  • Depending on how many new readers there are, it may be worthwile to develop an automated process in PostgreSQL, but this update was done manually in the Excel template. In addition to the lat/lon, the streets where the readers are is also needed in 2 columns, and the segment name.
    => What are the 2 columns called? There is no "segment name" in the template.

  • The segment name is always the first two letters of each street, with the corridor street first and the intersecting street second. For example, a reader at Bloor/Christie measuring travel times on Bloor would be named BL_CH. For each proposed route, the excel sheet is populated with the start reader, end_reader and assigned a unique analysis_id. For this batch of new readers, analysis ids starting from 1600000 were added.
    => There are no columns start reader, end_reader and analysis_id in the template.

  • Therefore the reader table will have the following fields populated: analysis_id, street, direction, from_street, to_street, from_id, to_id, start_point_lat, start_point_lon, end_point_lat and end_point_lon.
    => these columns are not in the screenshot. Where are they supposed to be added? Also, there are many columns in the template that are not explained. e.g. !ozone, !zonelist, Maxtime, etc

  • After uploading excel data to PostgreSQL
    => to where? bluetooth.all_analyses ?

  • Should the query SELECT DISTINCT mohan.new_added_detectors.from_id::integer AS bluetooth_id have CREATE TABLE mohan.bluetooth_nodes AS at the beginning?

  • especially for oddly shaped intersections.
    => can you give an example of oddly-shaped intersections? What happens with them? How to fix them?

  • The table is now ready to be appended to the existing routes table.
    => can you show the query to append the table?

```
Draws lines using the reader locations. This is the result.
Check that correct intersections are returned from this query especially for oblique intersections with an offset. If required, correct the intersection_id and geom for such intersections and finalize the table `mohan.bluetooth_nodes`. An example is shown below. The intersection between Dundas St W, Roncesvalles Ave and Boustead Ave is oblique with an offset. The BT reader 5263 (red dot) picked up intersection_id `13466305` (greed dot) as nearest intersection to the reader. But the planned route does not go through this intersection_id. The correct intersection_id for this reader is `113466258`. The intersection_id and geom for this redear was therefore corrected manually.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra 1 in 113466258

INTO rliu.new_bt_coflate1000
FROM conflation
SELECT analysis_id, street, direction, from_street, to_street,
CASE WHEN geom_dir != direction THEN ST_reverse(geom)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is needed because we are using centreline with direction as a base network for which the direction of the line is already flipped. This might result in double flipped geometries if the dirs didnt match because of a NB vs WB comparison


![bt_segments](img/bt_segments.PNG)
## Things to note
Geostatistical lines and planning boundaries need to be avoided while pgrouting.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also railways and rivers and stuff


## Cleanup, Warnings, and Alternative Methods
## Validating Output
Validate the length of the segments with length ST_length(geom) and direction using gis.direction_from_line(geom) functions.If the detectors are located very close to the centerline intersections, it is not necessary to do the centreline cutting. If any bluetooth detectors are not located at the start or end point of a centreline, we will need to cut the centreline using ST_linelocatepoint() as explained in [here.](https://github.com/CityofToronto/bdit_data-sources/issues/234) .
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to include the steps to cut the line here in the readme instead of linking the issue.


At this point, the layer has multiple entries containing centreline segments for each analysis_id. Instead, we need a single line/entry for each analysis_id. This query concatenates the individual centreline segments into one line. The main function doing this work is [`ST_LineMerge()`](https://postgis.net/docs/ST_LineMerge.html)
The new routes table is now ready to append to the existing routes table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bluetooth.segments table has the columns:

  • segment_name ,
  • analysis_id,
  • street ,
  • direction,
  • start_cross ,
  • end_cross ,
  • length ,
  • bluetooth ,
  • wifi ,
  • geom ,
  • duplicate ,
  • reversed ,
  • end_street

So its missing:

  • segment_name , bluetooth, widi, duplicate, revered, end_street

@@ -0,0 +1,40 @@
# Bluetooth reader's status
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can go under the readme?

@chmnata
Copy link
Collaborator

chmnata commented Mar 31, 2022

List of modified things:
SQL

  • Functions:
    • broken readers: simplified, need to know which version of detectors_history_corrected should be used
    • insert report date: simplified
    • reader locations dt update: should be deleted
    • reader status history: simplified, need to know which version of detectors_history_final should be used
  • Table:
    • added length column to mohan.bt_segments_new
    • OWNER to bt_admins
    • Change create new segment to not ST_reverse geom

DAG:
- updated to use sqlsensor
- updated to send alert message for broken readers and not fail the task

README
- added sql examples in readme
- added steps to cut lines with linesubstring

Things that needs to be fixed:

  • readme is missing the explanation of detectors_history, currently in another md under update
  • mohan.bt_segments_new is not ready to insert to bluetooth.segments, its missing a couple columns
  • need to figure out which version of detectors_history_corrected to use then update function: broken reader and reader status history
  • sqls are only updated in github not in postgres

@chmnata
Copy link
Collaborator

chmnata commented Apr 6, 2022

Looking to move the detectors_history_* to bluetooth schema, mohan.detectors_history_corrected is the corrected version, however there are some null values on several columns. I will dig into how to populate those.
e.g. some geoms are null and bt_id are null
image


-- Combine active and deactive detector
, detector_status as (
SELECT DISTINCT detector_list.detector_name,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should select distinct on reader_id instead of detector_name, because we are inserting reader_id's status at the end of the query and multiple detector_names can be refer to one reader_id

-- return 216 rows
select distinct reader_id, read_name from mohan.detectors_history_corrected

-- return 178 rows
select distinct reader_id from mohan.detectors_history_corrected


INSERT INTO bluetooth.reader_status_history (reader_id, last_active_date, active, dt)

SELECT DISTINCT reader_id, max(last_reported), route_status, dt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this select statement is necessary because the previous cte already remove possible dups


SELECT DISTINCT reader_id, max(last_reported), route_status, dt
FROM detector_status
WHERE reader_id IS NOT NULL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no null reader_id in the detector_history table

@chmnata
Copy link
Collaborator

chmnata commented Apr 6, 2022

Working on the new segments tables and adding the missing columns: segment_name , bluetooth, wifi, duplicate, reverse, end_street
For bluetooth and wifi, according to https://github.com/CityofToronto/bdit_data-sources/blob/e98182d30de5511a2a4c246e4771f14c937c6b0f/bluetooth/sql/create_tables/segments.sql seems like bluetooth was assigned TRUE and wifi was assigned FALSE, but when I check the bluetooth.segments table there are values for wifi that are true 😕
Also not sure what reversed means, reversed geometry (?)

@radumas
Copy link
Member

radumas commented Apr 7, 2022

Just in case future spelunkers come here.
wifi = TRUE on highways where the problems associated with using WIFI observations are reduced.
reversed was a flag to identify segments had been drawn against the direction of travel and which had been subsequently reversed. I'm really not sure why it stuck around.
duplicate appears to have been to identify overlapping segments, which are exclusively exist on College. Since A LOT of those routes are offline, and some appear to have never been created, I am rather confused by the state of those detectors and why this was created this way in the first place.
image

@chmnata
Copy link
Collaborator

chmnata commented Apr 27, 2022

In bluetooth.routes_temp, there are 61 routes out of 402 with no data in bluetooth.aggr_5min and bluetooth.observations.

image

But it does seems like some routes with data do have coverage on those routes with no data. (Green dotted lines are the ones without data and grey lines are the ones with data)
image
image

Ones that still have coverage (or partial coverage):

  • Adelaide from parliament to broadview 1495398, covered by 1454095
  • Parliament from gerrard to calton 1496472, covered by 1455692 (Parliament from dundas to calton)
  • College from ossington todufferin (1600090), sorta half covered by college from dovercourt to bathurst 1412888
  • Yonge between Chatsworth and Castlefield (1460798, 1461616), covered by various short and long routes (1428295, 1448887, 1448891, 1448413, 1428324, 1448861)
  • Sheppard from Morningside Ave to Nelson (end street was coded as sheppard incorrectly) 1460834, covered by VARIOUSSSSS short and long routes 5 in total

@chmnata
Copy link
Collaborator

chmnata commented May 2, 2022

Checking the distinct status in reader_history table, routepoint name does not sounds like a status type (?), added is also ???. I guess technically according to this issue #196, this table doesn't need a status column 🤔

  • "duplicate"
  • "routepoint name"
  • "online"
  • "offline"
  • "decommissioned"
  • "added"
  • "moved"

The following readme explained the meaning behind each status, kindddd of feels like the below 3 is confusing or just keeping a log of duplicated data/data of human error.
https://github.com/CityofToronto/bdit_data-sources/blob/btdag/bluetooth/update/reader_status_details.md

  • added (these are the locations where names are inconsistent for example a reader with bt_id 4035 has a name E in one table and E4035 in another. Despite being at the same intersection at Gardiner & Parliament. Therefore, these two names could refer to the same reader. Thus to distinguish one got the status added and another is offline )

  • duplicate (duplicate does not have a bt_id but based on its location and/or the reader name, there are more than one at a given intersection. The one that has got less details is assigned as duplicate)

  • moved (moved has a bt_id but the same bt_id existed in a different location in the past. For example bt_id 5331 was at Gerrard & University. But the same bt_id also is at Lawrence & Mt Pleasant and is offline as per latest table. Thus 5331 has two entries with status offline and moved.

  • routepoint name (does not have a bt_id. It is a name given to a point along a route. Could there be a detector at those points ? most likely not)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants