Validate flow estimates for walking and cycling #10

Robinlovelace · 2019-06-26T15:16:37Z

For each mode.

mpadge · 2019-07-05T09:58:48Z

Note that Brutus (#11) analyses flows between 11 types of destinations: own home, own work, other residential/visit place, work related visit, own school, “kiss&ride”, day care, shopping, restaurant, sport/culture/other free time place. Our layers cover all of those, which is good.

mpadge · 2019-07-07T19:54:20Z

@Robinlovelace New, usable data for Kathamandu in who-data repo - yep, that's moveability data. This code is far more up-to-date, and works really well. I now need to update the old flowlayers code to function like moveability. The results look much better than any previous attempts at flow layers for Kathmandu:

Those data should be able to be plugged straight in to upthat - a task that intend to tackle myself first thing tomorrow morning.

Robinlovelace · 2019-07-08T08:19:46Z

Hey @mpadge that looks great to me! Many thanks, looks awesome. I plan to set-up a traffic flow validation repo in ITSLeeds that we can use for multi modal flow estimate validation using data from

here for motor vehicles: https://data.gov.uk/dataset/9562c512-4a0b-45ee-b6ad-afc0f99b841f/highways-england-network-journey-time-and-traffic-flow-data
here for motor vehicles and other modes: https://roadtraffic.dft.gov.uk/downloads

Robinlovelace · 2019-07-08T11:14:44Z

Update for @mpadge: here's a place where I've created a 'visualisation challenge': https://github.com/ITSLeeds/trafficEstimatr

mpadge · 2019-07-08T11:18:22Z

It's private ...

Robinlovelace · 2019-07-08T11:19:43Z

Yes - feel free to take code from there to validate the flows. Or is it that you cannot see the repo? It thought you were in the organization.

mpadge · 2019-07-08T11:48:15Z

I can't see it 🙈

Robinlovelace · 2019-07-08T12:00:48Z

Apologies, my bad!

Robinlovelace · 2019-07-08T12:02:05Z

Just added you: https://github.com/ITSLeeds/trafficEstimatr/invitations

Robinlovelace · 2019-07-10T09:35:34Z

🚶‍♂️ 🚲 🚀

mpadge · 2019-07-12T13:10:22Z

Calibration of bicycle estimates can be done via bikedata. Calibration of pedestrian estimates can be done using the following sources of open data:

Number	Location	Number of Counters	Notes
1	New York City	114	Very extensive spatial coverage all 5 Boroughs
2	Melbourne	53	Descrbied here, but very spatially restricted to inner city only
3	Kansas City	610
4	Auckland	19
5	Louisville KY	49
6	Toronto CA	2,280

New York could then be used as a benchmark city for ~~both~~ bicycle ~~and pedestrian~~ flows, and Kansas City is clear preference for pedestrian flows - much smaller that NYC and much higher density data.

current plan:

Flow layer	City	✔️
pedestrians	NY	✔️
pedestrians	Kansas	✔️
bikes	NY	✔️
bikes	(somewhere else)	❓

A good additional bike city might be Guadalajara, because at least that is somewhat beyond the Global North.

mpadge · 2019-07-15T15:48:02Z

@Robinlovelace Very first try at moveability versus actual pedestrian counts across New York City, noting that moveability is not intended to model such things at all:

Relationship is definitely positive and highly significant (p = 0.002).

Robinlovelace · 2019-07-15T16:28:04Z

Looking good! Shows that measuring pedestrian flows is hard. Results from another paper using the recently open sourced sDNA package:

Robinlovelace · 2019-07-15T16:29:19Z

R2 values:

Link to paper: https://arxiv.org/ftp/arxiv/papers/1803/1803.10500.pdf

mpadge · 2019-07-15T18:49:26Z

Great reference - thanks for the link! The result above is the crudest possible, with no detail whatsoever, but nevertheless provides most of the ground work for dividing into the usual layers and adding detail. That should all be (mostly) done tomorrow.

Robinlovelace · 2019-07-16T07:29:57Z

Awesome progress sir. Suggest we catch up Thursday morning on all this.

mpadge · 2019-07-26T09:13:16Z

@Robinlovelace flows are coming together, but check this intermediate result:

Then have a look at OSM - there are now two new unplanned settlements that have been mapped in very detailed form. That just means that OSM can't be used to model spatial interactions with activity centres, because there are enormously more of those in those two settlements that anywhere in Accra. But no worries, coz reverting to the (by now less detailed but nevertheless more spatially representative) google places equivalent gives this:

And one of the coolest things is of course that we can find out who is behind this work, which is largely people associated with OSM Ghana, like Etse Lossou, Sammy Hawkrad, and even Essuanlive who is on github - hi and thanks @essuanlive and others! Keep up the great OSM work - it's being used for this and other World Health Organization projects!

Robinlovelace · 2019-10-14T15:11:46Z

Can you provide an update on the status and planned next steps on this please @mpadge ?

mpadge · 2019-10-15T09:39:12Z

Thanks for the nudge - this is now all subsumed within the calibration work. This week should see quite some more being added to the manuscript, and I'll then also ensure that I detail next steps here, which will essentially involve procedures for generalising the specific calibration steps applied there to other areas (Accra, Kathmandu). Coming very soon - just got centrality implemented this morning, exactly agreeing with igraph values, but now able to be implemented in parallel, which will give huge speed gains.

Robinlovelace · 2019-10-15T10:12:16Z

Sounds amazing @mpadge, cheers for the update, pls nudge me when you want feedback 👍

mpadge · 2019-10-15T13:57:47Z

Shall do! There'll be lots of nudges as soon as it's in "production" mode, which should be very soon. We'll need to discuss a lot of aspects of how to generalise from the calibration resuts

Robinlovelace · 2019-11-04T17:45:04Z

Apologies for nudge @mpadge but any updates on this from weekend of re-running the code?

mpadge · 2019-11-04T19:25:17Z

arghhh ... don't apologise, it's absolutely high time that i report ... all ran well, but some results were a bit odd. I dug in to it today, and there are some deep internal issues with RcppParallel, or actually with TBB, which is the parallel core of that. That whole bundle desperately needs updating, and I suspect this is related to that. Now trying to find a workaround ...

Robinlovelace · 2019-11-04T23:19:17Z

Maybe run the non parallel version on a subset of the data?

mpadge · 2019-11-07T20:51:12Z

and finally @Robinlovelace, the ridiculously long-awaited near-final version of what we've been waiting for all this time ... not exactly reprex-able, but the "final-model.Rds" is directly produced by a package function, so not far from reproducible. All data will be uploaded to repo, and then all will be totally reprex.

Note: I also improved the statistical robustness of the variable selection procedures, which has slightly reduced the final R² value from my claimed 0.925 as unadjusted value to 0.885 in adjusted form. Whatever, that's still spectacularly high.

dat <- readRDS ("final-model.Rds")
x <- dat$flowvars
mod <- lm (dat$p ~ x)
summary (mod)
#> 
#> Call:
#> lm(formula = dat$p ~ x)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4331.2  -774.9   -95.3   720.5  3438.5 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  1.282e+03  3.212e+02   3.992 0.000123 ***
#> xsub-dis     1.092e+01  2.550e+00   4.283 4.15e-05 ***
#> xsub-tra     8.974e-01  1.087e-01   8.257 5.33e-13 ***
#> xent-dis     2.347e+04  1.798e+03  13.055  < 2e-16 ***
#> xsub-res    -7.732e-01  1.356e-01  -5.701 1.15e-07 ***
#> xsub-cen     6.607e-01  8.660e-02   7.630 1.23e-11 ***
#> xedu-ent     9.193e+02  1.480e+02   6.211 1.13e-08 ***
#> xtra-sus    -1.279e+02  2.475e+01  -5.166 1.18e-06 ***
#> xsub-ent    -9.099e-02  1.959e-02  -4.644 1.01e-05 ***
#> xedu-dis    -5.448e+04  1.479e+04  -3.684 0.000368 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1327 on 103 degrees of freedom
#> Multiple R-squared:  0.8939, Adjusted R-squared:  0.8846 
#> F-statistic: 96.41 on 9 and 103 DF,  p-value: < 2.2e-16
# the terms in the model are the following categories:
categories <- c ("subway", "centrality", "residential", "transportation",
                 "sustenance", "entertainment", "education", "healthcare")
cbind (substring (categories, 1, 3), categories)
#>            categories      
#> [1,] "sub" "subway"        
#> [2,] "cen" "centrality"    
#> [3,] "res" "residential"   
#> [4,] "tra" "transportation"
#> [5,] "sus" "sustenance"    
#> [6,] "ent" "entertainment" 
#> [7,] "edu" "education"     
#> [8,] "hea" "healthcare"
dat <- data.frame (model = fitted (mod), observed = dat$p)
r2 <- summary (mod)$adj.r.squared
library (ggplot2)
theme_set (theme_minimal ())
ggplot (dat, aes (x = model, y = observed)) +
    geom_point () +
    geom_smooth (method = "lm") +
    ggtitle (paste0 ("R2 = ", signif (r2, 3)))

^{Created on 2019-11-07 by the reprex package (v0.3.0)}

One really interesting -- and hopefully very important -- thing that emerges from this is the identification of significant layers. For better readability, this is the above table via kable, and ordered in decreasing significance:

dat <- readRDS ("final-model.Rds")
x <- dat$flowvars
mod <- summary (lm (dat$p ~ x))
coeffs <- mod$coefficients [order (mod$coefficients [, 4]), ]
rownames (coeffs) <- substring (rownames (coeffs), 2, nchar (rownames (coeffs)))
knitr::kable (coeffs)

	Estimate	Std. Error	t value	Pr(>\|t\|)
ent-dis	2.347237e+04	1.798025e+03	13.054527	0.0000000
sub-tra	8.974071e-01	1.086859e-01	8.256882	0.0000000
sub-cen	6.607116e-01	8.659530e-02	7.629875	0.0000000
edu-ent	9.193188e+02	1.480132e+02	6.211060	0.0000000
sub-res	-7.732533e-01	1.356454e-01	-5.700548	0.0000001
tra-sus	-1.278608e+02	2.475282e+01	-5.165503	0.0000012
sub-ent	-9.098710e-02	1.959150e-02	-4.644218	0.0000101
sub-dis	1.092303e+01	2.550235e+00	4.283146	0.0000415
Intercept)	1.281941e+03	3.211573e+02	3.991628	0.0001231
edu-dis	-5.447829e+04	1.478837e+04	-3.683860	0.0003682

^{Created on 2019-11-07 by the reprex package (v0.3.0)}

And the three most significant layers are:

Undirected dispersal from entertainment centres;
Travel from subway to transport (generally parking facilities, both car and bike);
Travel from subway directed towards measures of centrality

Robinlovelace · 2019-11-08T17:09:26Z

Spectacular work @mpadge, I plan to give you a call in a bit, sorry my side has been bogged down...

Robinlovelace assigned Robinlovelace and mpadge Jun 26, 2019

mpadge mentioned this issue Jul 22, 2019

calibration cities & data ATFutures/nyped#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate flow estimates for walking and cycling #10

Validate flow estimates for walking and cycling #10

Robinlovelace commented Jun 26, 2019

mpadge commented Jul 5, 2019

mpadge commented Jul 7, 2019

Robinlovelace commented Jul 8, 2019

Robinlovelace commented Jul 8, 2019

mpadge commented Jul 8, 2019

Robinlovelace commented Jul 8, 2019 •

edited

Loading

mpadge commented Jul 8, 2019

Robinlovelace commented Jul 8, 2019

Robinlovelace commented Jul 8, 2019

Robinlovelace commented Jul 10, 2019

mpadge commented Jul 12, 2019 •

edited

Loading

mpadge commented Jul 15, 2019

Robinlovelace commented Jul 15, 2019

Robinlovelace commented Jul 15, 2019

mpadge commented Jul 15, 2019

Robinlovelace commented Jul 16, 2019

mpadge commented Jul 26, 2019

Robinlovelace commented Oct 14, 2019

mpadge commented Oct 15, 2019

Robinlovelace commented Oct 15, 2019

mpadge commented Oct 15, 2019

Robinlovelace commented Nov 4, 2019

mpadge commented Nov 4, 2019

Robinlovelace commented Nov 4, 2019

mpadge commented Nov 7, 2019

Robinlovelace commented Nov 8, 2019

Validate flow estimates for walking and cycling #10

Validate flow estimates for walking and cycling #10

Comments

Robinlovelace commented Jun 26, 2019

mpadge commented Jul 5, 2019

mpadge commented Jul 7, 2019

Robinlovelace commented Jul 8, 2019

Robinlovelace commented Jul 8, 2019

mpadge commented Jul 8, 2019

Robinlovelace commented Jul 8, 2019 • edited Loading

mpadge commented Jul 8, 2019

Robinlovelace commented Jul 8, 2019

Robinlovelace commented Jul 8, 2019

Robinlovelace commented Jul 10, 2019

mpadge commented Jul 12, 2019 • edited Loading

current plan:

mpadge commented Jul 15, 2019

Robinlovelace commented Jul 15, 2019

Robinlovelace commented Jul 15, 2019

mpadge commented Jul 15, 2019

Robinlovelace commented Jul 16, 2019

mpadge commented Jul 26, 2019

Robinlovelace commented Oct 14, 2019

mpadge commented Oct 15, 2019

Robinlovelace commented Oct 15, 2019

mpadge commented Oct 15, 2019

Robinlovelace commented Nov 4, 2019

mpadge commented Nov 4, 2019

Robinlovelace commented Nov 4, 2019

mpadge commented Nov 7, 2019

Robinlovelace commented Nov 8, 2019

Robinlovelace commented Jul 8, 2019 •

edited

Loading

mpadge commented Jul 12, 2019 •

edited

Loading