-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prediction certainty #111
Conversation
First draft of prediction type
I certainly think it is a good idea to add a prediction_type attribute to the GTFS-realtime spec. This will allow systems, like Swiftly's Transitime system, to clearly indicate whether predictions in the user interfaces should be displayed with a real-time icon or not. I think there are two issues that should first be dealt with, clarifying the possible values and determining if any other related additions are needed. With respect respect to clarifying the possible values, I'm not clear on UPDATED_SCHEDULE. My current opinion is that there should simply be a value SCHEDULED to indicate that there is currently no location information for the vehicle for that trip and that the predictions are just based on the schedule. Could be that the standard schedule is used or that an improved scheduled based on historical travel times is used. Either way, I think it would be good to simply indicate that the predictions are schedule based. For such predictions a real-time icon should of course not be shown in the user interface. With respect to other possible additions, I think another key one is to indicate if a prediction is based on when the driver is supposed to leave a terminal or a wait stop. Predictions for after such a stop are directly affected by driver behavior, when the driver actually leaves the terminal. Such predictions are inherently not as accurate. Some user interfaces could indicate such. A possible way to deal with this is to have an additional possible value of REALTIME_DRIVER_AFFECTED. One last thing, I suggest that "IMPRECISE_REALTIME" be changed to "REALTIME_IMPRECISE" so that it is obvious that the main goal of this attribute is to indicate whether or not the predictions are real time or schedule based. |
The reason I put UPDATED_SCHEDULE instead of simply SCHEDULE was because the role of simply showing scheduled is usually reserved for the static GTFS. UPDATED_SCHEDULE was to make clear it wasn't a regular scheduled case. Re REALTIME_DRIVER_AFFECTED, I'm not strongly against it but it for a consumer application it's not something that's useful. What's useful is to make sure that we display the same information in all interfaces. For example, if Swifly shows departure dependant on driver behavior as real time in the SMS interface, the departure should be tagged as REALTIME. If Swifly doesn't tag them as real time then it should be REALTIME_IMPRECISE. Agreed on the REALTIME_IMPRECISE change, will do the modifications. |
I would agree that if I also don't understand why there would be a value (SCHEDULED) for "predictions are a copy of the schedule because we have no real-time information". Again it seems to me that real-time data should simply not be provided in this case. As for UPDATED_SCHEDULE, from the passenger's point of view this is still a deviation from the published schedule. How would this information be used by a consumer application? |
Sometimes real time prediction can have low accuracy so users shouldn't rely on it the same way they do for other real time predictions, but they are still more precise than the actual GTFS schedule data, so it's worth sending. Re-How to use In a few words, we want producers to send the latest information that they have, even if it's not up to regular real time standards. |
@mike-swiftly @abyrd Any responses on the follow ups earlier? I would like to call for a vote soon. If you think the answer were satisfactory let me know :) |
gtfs-realtime/spec/en/reference.md
Outdated
## _enum_ PredictionType | ||
|
||
Experimental field, subject to change. | ||
PredictionType represent the source of data of the prediction. This lets consumer adjust their display of the information depending on the source (for example, with a real time symbol). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PredictionType
doesn't really seem like a source of data based on the enumerated values. It's more like a qualifier of the type of data.
I tend to agree here. @gcamp Can you give an example of how a consumer would use data of type |
You want to mention that the vehicle is driving (hence it is not cancelled) but are not aware of the whereabouts. Opposed to "there is no realtime status". |
@gcamp I agree that However, to me it seems like there are two very different uses of
It seems unnatural to me to bundle these two very different concepts into the same field. For the first two values ( I'd be interested to hear from producers to know whether they can actually differentiate between the two quality levels of predictions internally. Given that One motivation for using a separate channel for schedule updates is that (FWIW, |
I think that there is lack of agreement on how to move forward because we are trying to put too much information into a single field. I think a key goal should be to be able to clearly specify a prediction is due to GPS from a vehicle or whether it is something else, like schedule based. A secondary goal could be to specify more information about the source and the quality of the prediction. Therefore I think should have a boolean parameter called something like I think splitting things into these two parameters would allow us to move ahead pretty quickly with |
gps_based is something that would not work for me, especially since most vehicles use multiple sources like odometers etc. |
Stefan make a good point about the name "gps_based". Other examples of real-time position systems include train control systems for when vehicles are underground. So a different name would be an improvement. Perhaps "real_time_information" or "real_time_location" or ??? |
@mike-swiftly What are your thoughts in terms of naming the field after the action producers want the consumer to take (e.g., |
I have to vote -1 on the current pull request because I think it does not address the goal of clarifying when the real-time icon should be displayed in a client application. It does address uncertainty, but it still assumes that predictions should always be based on GPS. The RealTimeCertainty value of LOW is described as:
I think the above definition of LOW is good for handling those specific situations. But there are other situations that we were trying to handle, such as when there is no GPS for a vehicle because no vehicle has been assigned to particular block/route. For schedule based predictions the certainty would be LOW, and no real-time icon should be displayed as part of the associated predictions. But there are other situations where the certainty would be LOW, such as the prediction being real-time GPS based but also dependent on driver behavior. For the driver behavior situation would still want to display the real-time icon since the predictions are based on real-time GPS info. This means that a certainty of LOW is not adequate to specify whether or not the real-time icon should be displayed. But perhaps there is a very simple solution. I propose we add another possible value called something like NOT_REAL_TIME. Before I was proposing a separate field, but I'm trying to keep things simple here so that we might be able to move forward. If the certainty is HIGH or LOW then the real-time icon should be displayed. If it is NOT_REAL_TIME then the icon should not be displayed. If the value is UNKNOWN then most likely the icon should be displayed. |
@mike-swiftly The goal of
The point of certainty was exactly to know if it's certain enough to show the prediction icon, not if the prediction is based on GPS or now. If an app should show the real time icon it should be For changes to the schedule, that is not real time at all, then further discussion should be in #113 since it was removed from this PR. |
@mike-swiftly said:
This reaction is understandable, considering the title of this ticket. I believe some commentators' understanding of the goal of this ticket has shifted considerably since that title was written. Maybe this fact hasn't been addressed clearly enough, but much of the discussion has shifted to communicating how accurate a prediction is, sidestepping the original goal of controlling a real-time icon in an end-user app. To me, the details of when a particular icon is displayed, and even the definition of "real-time" being used in the comments are specific to particular consumers and may not be relevant to a general-purpose firehose data specification like GTFS-RT. We should be careful not to assume that symbols and features in particular end-user apps are shared by all other consumers of GTFS-RT.
Nothing here assumes any predictions are "always based on GPS". As already stated above in response to your proposed By "based on GPS", do you instead mean "not solely based on schedules"? Do you consider predictions sent every few minutes as GTFS-RT messages, but based entirely or largely on schedules and historical data to be "not real-time"?
I think this is an example of a consumer-specific concept of "real time". Rather than discussing how producers can directly control this particular app-specific UI element, it may be more productive to talk about which pieces of raw information you need from the feed to control this UI element. @mike-swiftly, as I understand it you want information about the data sources used to produce the prediction, including whether GPS location or schedules were considered in the prediction. Is this correct?
It is not clear to me what it means for a real-time message to be "not real-time". We will need a full definition to discuss clearly. But to me, this description of the prediction seems mostly orthogonal to the proposed accuracy enum. Ideally we can include all the necessary information in GTFS-RT messages to allow you to control your app following your definitions and design. But I would prefer to include general-purpose pieces of information in the GTFS-RT messages, rather than items which allow controlling UI elements of one particular app according to its own definitions. |
Vote is now closed. Votes are +2, -1 and thus the proposition doesn't pass. @mike-swiftly can you detail your thinking a bit after the responses? |
@mike-swiftly Can you please answer, this is blocking this current PR. I'll start an other vote next week if there's no response. |
My one single issue, as I have tried to previously state, is that the current proposal does not address the issue at hand for real-time information systems to indicate if predictions are based on real-time location information or not. I am completely in agreement with also specifying uncertainty values. But the proposed values are not adequate to also indicate whether predictions are based on real-time location info. Side note: we are at a disadvantage of not receiving feedback from a significant number of other producers or consumers of GTFS-realtime data. Hey Google, CityMapper, Moovit, NextBus, Clever Devices, Init, etc. are you out there??? Changing the title of the pull request does not change our initial goal. I think it only tries to circumvent it. So let's figure out how to move this forward in a concrete way. But first, a bit of explanation to the questions that @abyrd asked. I see real-time information as being based on real-time vehicle location information. Of course there are also additional possible information sources, such as historical data. But the key thing is whether the system actually knows that there is a vehicle in service. The real-time location is usually GPS based, though as @abyrd pointed out, it can of course be from other sources, such as a train control system. Now let me try to define my previously used term NOT_REAL_TIME (if people have better suggestions for terms, that would be great. I'm just interested in the meaning). A prediction indicated to be NOT_REAL_TIME would mean that the system has not received location information for a vehicle that will serve the corresponding trip or block. The prediction information therefore is based on the expectation that a vehicle will be put into service. For some agencies this can be a good assumption, for others, a bad one. It depends on many trips an agency simply misses. I can tell you with certainty that some agencies like SFMTA miss quite a few. We don't really know the uncertainty. But we do know that it is not as certain as if there was real-time location information, and that can be useful information for the passenger. I expect that most clients would want to show a real-time icon only if real-time location information is available. Otherwise I think it would be very misleading to the passengers. Of course clients might also want to somehow convey other forms of uncertainty as well. But I don't think that a radio wave icon is understood to indicate certainty. Of course clients can design any kind of app they want to. It is their app. But information providers, like myself, should provide the necessary information so that the app designers can do what they think is right. Therefore if we either add NOT_REAL_TIME as a possible value to the proposed uncertainty field, or if we add a second field true/false to indicate that whether or not real-time location information was used for the prediction I think we would cover all situations and accomplish our goal. Can we do so? |
@mike-swiftly here's how I see it and the different cases.
I agree that the PR reduced in scope compared with the start, but it doesn't mean what we have now is not valid. |
Dividing predictions into simply two groups, high and low certainty, is simply not adequate when it comes to describing whether predictions are real-time based. When real-time location information is not available there are some situations where there is a somewhat but not completely high certainty (there isn't yet a live vehicle for the assignment yet, but perhaps this is common and there is a good change that a vehicle will be assigned and leave on time) and situations where there is a low certainty (who knows what is going on??). At least users can understand that a real-time icon indicates there is a tracked vehicle and a lack of the icon indicates there isn't one. @gcamp , what is your objection to making the change more descriptive by separating out the indication of whether the predictions are real-time or not? |
@gcamp can we change to
So is this
I'd prefer to see a more explicit indicator of when a vehicle isn't assigned to a trip, instead of abstracting this to a level of including "not real-time" information in the GTFS-realtime feed. In theory vehicle assignment to a trip can already be represented in TripUpdate.VehicleDescriptor (and VehiclePosition.TripDescriptor), but the problem is that the consumer can't tell if the field was left blank because the producer didn't have this information or if the trip is known to be unassigned. A related concept here is is "TripUpdate.delay", which is (emphasis mine) "the current schedule deviation for the trip. Delay should only be specified when the prediction is given relative to some existing schedule in GTFS." I think the intention here was to make this
So, we would need another enumeration to indicate vehicle assignment in TripUpdates. We could add |
@barbeau agree, changed that
The proposition of And for adding the update of schedule as a feature, I wanted #113 to be resolved before changing anything. For what it's worth, I agree that updated schedule should be done in the trip update format, but that's for the community to decide. @mike-swiftly would you agree to this current PR if service change would be able to update the schedule? Mobility Data is leading the service change proposition, maybe @LeoFrachet @barbeau have something to add into the current discussion regarding service changes. |
Bump 😬 |
[Disclaimer: I usually re-read the whole message thread before posting, but I didn’t because this thread is way too long. Guillaume (@gcamp), since you proposed it at the first place, could you update your first message with a sum up of the current discussion, scope, proposals? That would help new comers (and me) daring to answer without being off topic.] My understanding is that we need a replicable way to measure the « precision » (or « uncertainty ») of a prediction, so that an uncertainty of From the conversations I had with other stakeholders around GTFS-ServiceChanges and GTFS-Flex, I realized we need two values: the prediction time, but also the spreading. It can be a range: 10AM +/- 2 minutes (with the definition that in 99% of the time, the bus will pass between 9:58AM and 10:02). It would allow us to compare the two following prediction: 10AM +/- 2min vs 10AM +/- 15min, allowing the data consumer to decide when they want to flag the value as « real-time ». Two remarks on that: First, time ranges are not symmetrical in public transit, since passing early is the worst. It should never happened. I would be interested to see statistics of real passing time for a train scheduled at 10:00 for example. I’m expecting no train early, most trains on time, then a decreasing probability of being late. So instead of symmetrical timeframe, we should provide just the likely maximum delay. For example: 10AM +max 12min, defined as « the train will most likely be arrived at 10:00, but there is a 99% probability that at 10:12 it will be arrived. We could call such value Secondly, how can we calculate it? Good question. But, like, how can we calculate a prediction? There is a naive but good way to start: just use historical data, and see retrospectively what those values should have been. 24 hours before, your prediction_spread is usually 15min, then 1h earlier it’s usually 10 min, and 10 min earlier it’s only 1min. Good. Just use that. It’s enough to be able to compare them between agencies. And yes, sometime it will be wrong, just like the predictions are sometime wrong. But at the end of the month, we’ll be able to see how accurate your |
@LeoFrachet I agree with your comments on the fact that departure time distributions are skewed later than the scheduled departure time, and that it's complicated to calculate and define them. Even a value like "9:00 am + up to 5 minutes" is not really clear, because there's always some tail of the distribution out past 5 minutes, so what percentile is being reported? This PR is basically a response to the fact that no one is providing this kind of detailed precision information, and we'd be better off with a discrete, qualitative description of accuracy: either it is accurate enough to trust or it's not. This subject was found to be more complex because there's a second dimension that some people would like to fold into the same field: the source of the data (whether it is based on recently reported position or guessed from other sources). The conversation got long because some people just want a high-level judgement on whether the prediction is "good" or not, and others want to specifically know whether recently reported position data was used to calculate the prediction because they have a UI element that indicates this specific bit of information. The proposal was blocked by @mike-swiftly who it seems would really like a way to receive information about the data sources used to compute the prediction. |
@gcamp said:
I think it's inadvisable to ask people to agree to a proposal based on the assurance that another (as yet unapproved) proposal is going to cover their use case. This has the potential to create a larger problem further down the line, a situation where the other proposal "must" be accepted because people have predicated past votes on its acceptance. I know some organizations and people are counting on the service-changes proposal, but as I've commented elsewhere I have serious doubts that we will see widespread adoption of multi-layer GTFS patching by many consumer applications. This is asking every consumer in the world to carry out significant additional software development to understand a few feeds. This is unlikely to happen uniformly, and would basically fork the specification since any consumers that don't support service-changes could badly (and probably silently) misinterpret those feeds. |
Andrew (@abyrd) said:
I did define which percentile we should use. Defining the percentile is the key the replicability of my proposal. The definition I gave was "There is a likelihood of 99% that the vehicle will be passed at
Yup to you. I have no strong opinion. But I do think that without strict definition, everybody will use different definition, and the "LOW" of one agency will be the "HIGH" of some other. But if producers and consumers see a value, I'm not against it. |
@LeoFrachet said:
Sorry, I know that was in your proposal - I should have been more clear. I was partly replying to your post and partly musing about the problems with the existing GTFS uncertainty concept. A while ago there was a discussion elsewhere about the fact that you need to specify a distribution and the parameters to that distribution. We could freeze some of those parameters but it's all quite arbitrary and I do wonder how much it would really be used, and how correctly. I'm guessing these broader "certain/uncertain" categories would be more heavily and correctly used. |
@abyrd That's probably right, but we could still eventually add schedule information to the enum proposed whatever happens with the service change. My point was more that we could still improve the GTFS-rt spec right now instead of getting stuck on adding more to this proposal. @LeoFrachet Sorry for getting you into the discussion, my point was more if there was any advancement on service changes that could direct the discussion on if we should add a |
@LeoFrachet said:
I just did a quick informal literature review. As we'd expect, the distributions are skewed later than the scheduled arrival time, but up to 30-40% of buses arrive well before the scheduled time. Of course this is horrible for riders, but it's a fact visible in the data. And of course it depends heavily on mode of transport and location. If we accept that this variability is just a fact of life in bus operations, rider experience can be greatly improved by showing them both the expectation of arrival time and a safe time to reach the stop. For example, showing an expected arrival time of 12:30 but also saying you must arrive at 12:25 to have a 95% chance of catching this bus, and have a 95% chance of being on the bus by 12:35. Providing the whole distribution to RT consumers (based on historical data and/or current conditions) would allow this. It would also allow providing a much more nuanced time window for arrival times and transfers downstream. Perhaps a sort of empirical CDF with N evenly spaced percentiles. But again I'm not sure who will produce such data (correctly). We might be able to make it happen in the Netherlands, and if it was done with open source components maybe others would follow. But of course this would be a separate proposal: instead of the current ill-defined uncertainty field in GTFS-RT, we would have one very simple enum field (as described in this PR) and a separate very detailed arrival probability field. |
If provided, I think it need to be clear whether the agency is reporting a distribution based on historical data, or one created for the prediction given to the rider (which could be a mix of current and historical conditions). The first is relatively trivial to implement for producers if you have archived data, but less meaningful to riders if the current trip is an outlier - and these are the trips riders really want accurate information about. The 2nd can be much harder to do accurately, at least from my experience dealing with location accuracy on mobile devices. We found widespread inaccuracies in the "accuracy" value provided to apps by GPS hardware, and that was after producers had pushed to have the threshold be the 68% confidence level - it turns out they couldn't even meet that accuracy level. I'm not sure if this experience directly translates to arrival/departure predictions yet, but so far it feels similar to me. The other challenge is how the CDF is converted to something that riders understand - although that's outside the scope of the GTFS-rt exchange. This is a very similar problem to travel time reliability on the highway side of things, which I understand has been fairly heavily studied. One last set of facts to drive home that the current
|
We are currently developing a sophisticated delay prediction for the Dystonse project that will generate arbitrary detailed probability distributions for delays based on historical data and real time information. (Our repository is documented in English, but much of the extra information about our project is available only in German at this time.) The main purpose of those predictions will be in our own routing algorithm that takes the full distribution into account. We are looking into alternative ways to make our predictions available via APIs and/or feeds, so that other routing services or real time information displays can use them. The current spec of the As @LeoFrachet and @abyrd discussed over a year ago, defining specific percentiles is mandatory to have data that is comparable across data providers. Given our current data and algorithms, we could return a value for any percentile in theory, but for most routes, we don't have enough historical records to be really confident about the 1% or 99% percentile, but 5% and 95% should be fine. Even if detailed data was available for any given percentile, different use cases might benefit from using different percentiles. So from our viewpoint, an ideal proposal would allow to attach any number of delay/percentile pairs to a Also, if you know of any other place where similar ideas are discussed, please leave a hint. Here's a sample of our data, which coincidentally has very little skewing, although I know other examples that do have the heavy skewing that was mentioned before: |
On Monday, June 8, 2020 8:09:39 PM CEST, Lena Schimmel wrote:
Given our current data and algorithms, we could return a value
for any percentile in theory, but _for most routes_, we don't
have enough historical records to be really confident about the
1% or 99% percentile, but 5% and 95% should be fine. Even if
detailed data was _available_ for any given percentile,
different use cases might benefit from using different
percentiles. So from our viewpoint, an ideal proposal would
allow to attach any number of delay/percentile pairs to a
`stop_time_update`. On the other hand, we realize that hardly
anybody used the current `uncentainty` field, so maybe that
proposal would be overly complex for everybody but us. Any
thoughts on that?
I think it should not be forgotten that many transit agencies are using a
first order prediction to model their time table. Hence it is not a
summation of travel times, but the chance that given a departure time the
arrival at a stop can be met in 75-85 percentile of the cases. I think this
method has been described by the transit group of TU Delft in the early
90ies. So I wonder, shouldn't the schedule not first be reversed engineered
back, and then remodelled based on a historic estimation given the input
schedule that *does* effect how drivers execute it?
If we could assist you regarding data. For The Netherlands we have a multi
year historic real time dataset available for research purposes.
…--
Stefan
|
@lenaschimmel Same here - we've been collecting GTFS-RT data from 8-9 agencies over the last two years for related work going on at USF, with the goal of comparing multiple prediction techniques across different agencies. If you're interested in collaborating please contact me via LinkedIn - https://www.linkedin.com/in/seanbarbeau/. |
@skinkie I think you're referring to something like busbuzzard? |
Not that specific code, but it looks fun. I have asked some people if they could come up with the original paper. |
Unless there are big disagreement, I'm going to close this PR. Schedule change should probably be done in a potential Service Change spec. |
We have seen that
uncertainty
values in TripUpdates are sometime used to convey the information if a prediction is precise enough to be shown as real time in consumer applications. Different producers have different values whereuncertainty
has different meanings.This causes problems because consumers has to have different interpretation for different producers. It would be better to have a clear value for each prediction that defines if a prediction is precise enough to be shown as real time or not.
This is a proposition to add
prediction_type
TripUpdates as an experimental field. Predictions can either beREALTIME
(current default)IMPRECISE_REALTIME
where the vehicle is followed in real time but doesn't have enough precision to be shown as real time to the user. The prediction threshold for the switch betweenREALTIME
and andIMPRECISE_REALTIME
should be the same threshold that the producer uses to hide the real time icon in other interfaces like bus displays.UPDATED_SCHEDULE
where a vehicle is not yet followed in real time but some changes to the schedule was done after the latest GTFS export, very likely due to changes from the control center.I would also be curious to know if anybody uses the
uncertainty
value in GTFS-rt without having some magic threshold value. If that's not the case and usage ofprediction_type
is accepted and successful, I would propose to deprecateuncertainty
Note : I might be unresponsive in the next few weeks. @juanborre has my full confidence and can respond in my stead and represent Transit.