-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Status change events for entering and leaving visible area #214
Comments
See the Agency spec. Relevant events are |
Interesting, so maybe the path forward is to port these events to the Provider side? |
I think so :) |
This is a very important issue: the specs should contain exactly how I agree there should be two new events: The only question is what happens with the out-of-region part of the trip: the They can be either:
I recommend the Discard version as that would contain all information required to reconstruct the events of the vehicle, and would contain no duplication, hence no possibilities for conflicting data in the feed. |
Agree that anything between BTW Agency uses With One question raised is: how do you know if a I'm sure other questions have yet to even be identified. :) |
That's an interesting issue, what happens when riding on the region boundary. From a trip point of view, I believe those events shouldn't be needed at all. That is, if someone went out of the region (or it was just a GPS error) and then came back within the same trip, then it shouldn't matter. On the other hand, status changes feeds are about live-updates, so it might be needed. |
a way to address the border issue could be to establish a buffer zone along the boundary (e.g. 15m) and surface events pessimistically:
vehicles riding in the buffer would then be associated more tightly with the region they originated from. the major problem with this approach is that it is heavily dependent on both GPS accuracy and consistent implementation across mobility providers. |
Why not just add these events after the trip has finished? That would be a very clean solution, avoiding all the recursive in-out-in-out algorithms, which would be quite hard to implement correctly, both for providers and agencies or processors. and would just be asking for bugs. So I recommend: At trip start:
At trip end:
I strongly believe that anything more complicated than this will not be consistently implemented across providers / agencies and will just create unreliable data. |
The major pain point with that approach is alluded to in the op:
Some providers are already adding events retroactively to their feeds for trips that begun outside regions, and those events are already being missed by data ingestion setups that haven't been prepared with that consideration in mind, which is already confounding a variety of analysis scripts. Ideally, whatever solution we arrive at gets to a final decision about event inclusion in a particular feed as close to the event time as possible. |
Agency distinguishes between |
Yes, I agree. Many status change events are only reliably calculated once the trip has completed. For example, for technical reasons, |
I think it would be reasonable for there to be a |
But you cannot require providers to architect their database systems based on one idea we have. I believe most of them will correct information in |
LADOT is not requiring providers to "architect their database systems" a particular way. LADOT is requiring providers to comply with the MDS Agency standard. The MDS Agency standard, as written, says that when a provider submits trip-related events, those events will contain a UUID4 |
Of course not, it's an absolutely important requirement. All I said was that technically most providers might only fill in the associated_trips after the trip has completed. BTW, anyone with insider knowledge, why is it associated_trips and a list and not associated_trip and a string? How could a user_pick_up or user_drop_off event possibly be in multiple trips? |
this has been discussed but has not made it into We should probably circle back to the original issue. It would be useful to hear from some mobility providers and agencies about their thoughts on events as they relate to region crossing. If they think this would be a functional way to approach it, porting the applicable events from agency API for an upcoming release seems like it would be a reasonable way forward. |
I think @hyperknot 's suggestion here makes the most sense to use with a couple of additional thoughts.
The one alternative I think we'd propose is instead of Another benefit here is that reconciliation is easier since That being said, I think As @dyakovlev mentioned this is already something we are trying to do in our feed. It is obviously more difficult to do in near-time. While MDS specifies the Provider API is for "historical" data there is no clear definition of what this means. I would propose a two-pass approach data is available very quickly (a few minutes) to allow Provider data to be used for more real-time analysis but that agencies and aggregators understand to pull data 24 hours after the event_time requested to get all historical reconciliations. |
I think an almost-instant feed + a 24 hour reprocessing sounds reasonable. This way both up-to-date data can be fetched, as well as reliable, processed historical data for calculations after 24 hours. |
I also agree that it's great to have the temporary solution of user + service event at the boundary. Only two important remarks about that:
|
Thanks so much for the discussion, everyone! I see a few key issues to deal with here:
Porting the
|
event_type |
event_type_reason |
Description |
---|---|---|
reserved |
trip_enter |
Customer enters a service area managed by agency during an active trip. |
removed |
trip_leave |
Customer leaves a service area managed by agency during an active trip. |
This would require redefining removed
to encompass cases other than the device physically being removed from the street.
The next step would be to more broadly harmonize the Agency and Provider event schemas; I have thoughts about how to do this that I’ll save for another issue.
Enter/leave event semantics
I think @hyperknot’s proposal defining all trips in terms of two events at the endpoints makes sense, but the downside (in addition to timeliness stuff that would be nice to avoid getting into here if we can help it) is that API consumers can’t use the status changes API to get an accurate picture of how many vehicles were on trips within the area managed by an agency at a specific time.
To address a couple of specific points:
One question raised is: how do you know if a
trip_leave
definitively ends a trip? I think the answer is you need at least one more event (any event that’s nottrip_enter
I think) to be fully certain.
This is true but seems OK to me. If a client wants to track the lifetime of a particular trip, they can use the trips API to get that information retroactively. The purpose of the status changes API is to track the statuses of vehicles within the visible area, so once a vehicle has left that area anything that happens afterwards (until/unless it re-enters the area) seems like it doesn’t belong in the API.
I strongly believe that anything more complicated than this will not be consistently implemented across providers / agencies and will just create unreliable data.
We’ve internally kicked around both the physical buffer zone idea and a time-based debouncing idea and decided that both of these approaches are potentially beneficial but feel too complicated to bake into the spec. It seems OK to start with the assumption that providers will publish enter/leave events for every boundary crossing, while leaving room for providers to deduplicate noisy enter/leave events if they want to.
My suggestion:
- When a trip starts, publish a
user_pick_up
event if the vehicle is within the visible area. - During the trip, publish a
trip_enter
event whenever the vehicle enters the visible area (tagged with the location and time of the first observed point within the area) and atrip_leave
event whenever the vehicle leaves the visible area (tagged with the location and time of the last observed point within the area). The provider may choose to delay publishing these events for up to a minute in order to avoid redundant events if a vehicle only crosses the boundary briefly. - When the trip ends, emit a
user_drop_off
event if the vehicle is within the visible area.
What providers should do
I think @asadowns’s proposal of pairing service_start
/service_end
/user_drop_off
/user_pick_up
events makes sense as a workaround for now, as long as providers transition to real enter/leave events as soon as the relevant version of the spec is released.
Thanks for the detailed writeup @rf- . While it looks nice from a theoretical specs point of view, I honestly cannot imagine providers being able to offer a reliable source for the multiple trip_enter, trip_leave case. I can guarantee you that there'll be all possible combinations of At this point, why not just simplify the whole problem by adding a key like |
If I understand correctly, you're suggesting a sequence of (using the Provider nomenclature) an |
No, what I recommend is what some providers are already doing: inserting a The only improvement I'd propose, is to add a new key/property
|
Here is the current thinking at Ellis (who are building the reference implementation of MDS-agency for LADOT):
Sending If a ride starts in one jurisdiction and ends in another, the sequence should be A provider |
Separately, I'm not following the line of "all the recursive in-out-in-out algorithms" as relates to multiple |
I think complexity with multiple |
@rf- Sounds like general agreement. Any unresolved issues here? |
Hi everybody- This is a great discussion. There is actually some intentional divergence between the provider and agency specs here, since agency is about giving future permission while provider is about providing reporting information about historical ops. Referring directly to
Give this, it is fairly easy to construct a record of when devices left and entered the public right of way or jurisdiction by preforming a join. See SM's implementation for more details. Let me know if I missed any nuance or anything after reading this thread over. Going to close for now. |
@hunterowens SM's implementation will only work if the providers choose to terminate trips at city boundaries and add a service_end events afterwards. That is a good solution, I actually recommend that. Still, we need to put it to specs. Right now these two points are not written anywhere for providers. Here is my go at it: Specs:
|
@hyperknot I'll ask @toddapetersen to open a PR to add this clarification -- thanks! Just a clarification, the provider will stop sending events and telemetry once a jurisdiction is exited, but you don't necessarily want to terminate the trip, because the rider could return to the original jurisdiction. Those events (e.g. |
@Karcass this is not what @hunterowens proposes. There will be no new events, like A trip will always have exactly one start and one end (the user_ events). If a trip crossed the boundary, it's start or end will be adjusted to be on the boundary and a service_start or service_end event will be inserted. While the vehicle is outside the boundary, there will be no events in the stream, it's last event will be service_end, until it returns to the visible area. |
Sounds great! I'm not trying to over-complicate provider. Agency has somewhat different goals and constraints, and should be constructed as such. IMHO YMMV etc. |
I think folks are talking at cross purposes a bit here. My understanding of @hunterowens' position (which may be inaccurate) is that providers shouldn't emit any special events at service boundaries, but that API consumers should be expected to combine data from both the status changes endpoint and the trips endpoint if they want to put together a full picture. This is very different from (and contradictory to) asking providers to emit If the decision ends up being not to add |
Yeah, in the context of |
@rf- I agree. If there will be no service_end added and no new event types, then the only way to figure out if a trip crossed the boundary is to do geospatial lookup on the trips data and combine it with status changes intelligently. It's doable but I believe totally against the point of the specs, that is to make agency-prover communications simple and transparent. |
@hyperknot I'm a little late to this discussion, but just want to chime in on your point:
This interpretation is incorrect. Santa Monica's implementation works on two distinct windows: inactive and active, each calculated using a part of MDS Provider: inactive_windows are constructed via Status Change events, and represent time that a device is sitting in the PROW awaiting further activity. For example, the window between ( active_windows are constructed via Trips, much like you suggest in your comment above. This ensures that only the time the device is active within the geo-boundary (Santa Monica in this case) is what is counted. Additional Status Changes at the boundary are spurious and would not even play a role in the calculation. Further, it represents an inaccurate view of the activity/fleet - e.g. |
@thekaveman without additional service_end like events, how do you count the following device: before 13:00 ready to be picked up, inside PROW |
@hyperknot I will assume that by "inside/outside PROW" what you mean is "inside/outside the boundary", e.g. inside/outside the city limits of Santa Monica. Under your scenario:
So we have an
This implies a
This implies there was a final route point inside the boundary (just before it crossed out), let's say it was at 13:55. So the first active_window is defined from 13:00 to 13:55.
None of this data is seen, the device is not counted.
Again, this implies there was an initial route point inside the boundary (just after it crossed in), let's say it was at 19:05. So another active_window is defined from 19:05 to 20:00.
Status Change of Windows: 12:00 - 13:00 (inactive) |
@thekaveman I agree with your calculation and this is exactly the right logic we need to use. Where I don't agree is your usage of How can you possibly calculate those A. The provider cuts the trip at the boundary and inserts a service_end like event at those moments. Point C. also requires to have a reference GeoJSON/WKT file about the exact boundary for each city, and it also means that we are dropping the You are right C. could work, but this whole issue is about why C. is not a good solution, and why we recommend choosing A. or B. |
This was for our discussion here in the thread, using the scenario that you laid out. In the real world, we don't have In my answer to your scenario, I picked the times (e.g. 13:55, 19:05) for illustrative purposes - it really doesn't matter what we choose, the logic is exactly the same. Again, in the real world, this is (should be, if providers are sending accurate and valid MDS feeds) a known time. |
Seems like a bare minimum requirement for a company operating physical devices in geographic space, driving user engagement via geo-locating technology, to know the geographic bounds within which they operate. I would be willing to bet that nearly every municipality looking to participate in MDS already has this defined and readily accessible as data (maybe not GeoJSON/WKT as there are a lot of ESRI shops out there, but conversion tools are widely and freely available).
Quite the opposite. The I am still not seeing how additional event types or "fake" events at the boundary clarify any of this. Status Changes and Trips were specified as they are for exactly this reason - so they can be used together to get a complete picture of historical device activity. |
Let’s call the area where a given Provider API consumer is allowed to see all status changes the “visible area” for purposes of this issue. This is likely to be the boundaries of the jurisdiction of the agency making the request, but it currently varies between providers.
Right now, it’s unclear what events should appear in the status changes API when a user reserves a vehicle and takes it out of the visible area or vice versa.
It seems like there are currently two possibilities:
The status changes API includes events reflecting both endpoints of any trip that touches the visible area. If the user takes a vehicle into a neighboring city, the end of that trip will create an
available
status change, but whatever happens to that vehicle afterwards won’t be reflected in the API.The API only includes events that happen inside the visible area, so the end of a trip that leaves the visible area won’t appear as a status change.
In the first case, clients that want to put together a complete picture would need to observe that the trip ended outside the visible area and use that as a cue to not expect future events related to that vehicle (as opposed to the naive alternative of treating the vehicle as permanently
available
at whatever the end point of the trip was). This would work, but it depends on the provider and the client having a precise shared definition of the visible area, which doesn’t currently exist in MDS. Another downside of this approach is that, in the case where the user starts outside the visible area and then enters it, the provider will have to retroactively add areserved
event for the beginning of the trip once they see that the user has crossed into the visible area, making the contents of the feed inconsistent over time.In the second case, the last event the client sees for a vehicle leaving the visible area is
reserved
. This prevents the client from using the status changes API to calculate whether the provider is compliant with vehicle limits, since they don’t know if or when the vehicle actually left the visible area, or when a vehicle entered the visible area before being dropped off. This also makes it difficult to validate the integrity of the event stream, since basically any event can follow areserved
event.One solution to this problem would be to add explicit status change events when a vehicle leaves or enters the visible area, defined as the area within which the current API client is allowed to see events. The least invasive implementation would be to add a
left_visible_area
event type reason for theremoved
event type and anentered_visible_area
reason forreserved
. If that feels like an inappropriate use ofremoved
, it could make sense to add a new event type instead.The text was updated successfully, but these errors were encountered: