Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify stop times shape_dist_traveled must not exceed the trip shape's maximum distance #380

Merged
merged 3 commits into from
Jul 28, 2023

Conversation

stevenmwhite
Copy link
Contributor

I had assumed this was reasonably implied but realized it wasn't explicit when I was recently debugging a GTFS feed. In the feed in question, the stop times shape_dist_traveled went from 0 to roughly 5.5 while the shape's distance went from 0 to 3.7.

This was a standard one-way pattern -- not a loop -- and there was no reasonable explanation for this (except possibly that they used different units, which is not acceptable), yet the canonical validator did not throw an error. I subsequently used the older Google validator and did get a warning -- both for the distance exceeding the shape's maximum and for a resulting side-effect of the stop being too far from the shape based on that distance.

It seems to me that there's no reason the stop time's distance can ever legitimately exceed the shape's distance, but a few things I've heard or discussed with others as possibilities include:

  • Routes that Loop
    • For a route that loops over itself (but does not have the same start and end point), I expect that the shape has to be drawn such that it actually loops over itself and there are two paths of the shape on top of each other. The shape_dist_traveled then helps indicate which portion of the path (the first time the bus goes down that street or the second) the stop time is connected to.
    • For routes that create a circle and meet itself at the end, I believe the trip has to be split at this meeting point. The shape_dist_traveled cannot go from the maximum back to zero, and there is no indication in the spec or best practices that by exceeding the maximum of the shape the intent is that the bus continues along the shape again from the start. (This would be similar to the way we can pass midnight by using times in excess of 24 hours... but I don't believe this is acceptable or intended from the spec for shape distances.)
  • Improperly drawn/simplifed shapes
    • Some producers simplify shapes incorrectly by not following the street grid and simply connecting stop locations with straight lines. In this case, perhaps a shape_dist_traveled in stop times could be greater than that of the shape, if it uses a more realistic street-traveled distance, but I don't think this makes for a valid GTFS file. This field is used to tell consumers where the stop (which is not on the line) connects to the line, and if the stop times use an accurate street-traveled distance while the shape does not, the data in this field will be a big unmatched mess and the values won't match up in the way they are supposed to.

I can't think of any other reason a stop time's distance can or should exceed the distance for the shape. Does anyone else have an example where this is expected or acceptable? I believe the intent of the spec is that it should not, and this PR clarifies that explicitly.

Added rule specifying that the `shape_dist_traveled` for a stop time must not exceed the total (maximum) distance for the shape of the given trip.
Clarified the shape for the related trip
@google-cla
Copy link

google-cla bot commented May 22, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Minor update to create commit with non-private email address.
@antrim
Copy link
Contributor

antrim commented May 23, 2023

Thanks!

Aside: I think the spec leaves open some interpretation here. What about trip patterns that only use part of a shape? I think some producers do this.

i.e.

Shape:  A-B-C-D-E
Trip 1: A-B-C-D-E
Trip 2: A-B-C
Trip 3:    B-C-D

You could use shape_dist_traveled to reference part of the shape.

BTW, this is NOT how Trillium has implemented. In our GTFS there is always one shape_id per trip pattern.

@stevenmwhite
Copy link
Contributor Author

@antrim I totally agree and also interpret your examples all as valid. They would remain valid with these changes as none of them exceed the total distance of the shape.

@antrim
Copy link
Contributor

antrim commented May 23, 2023

@stevenmwhite Yes, you're right. Not sure my comment was totally relevant here.

maximum shape_dist_traveled for a trip should never exceed the maximum shape_dist_traveled for for the referenced shape.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 16, 2023
@stevenmwhite
Copy link
Contributor Author

Keep open.

@stevenmwhite
Copy link
Contributor Author

I'd like to call for a vote on this but I'm having an issue getting Google to recognize that I've signed the CLA.

Had some other issues to deal with and haven't been able to sort that out.

Can I open a vote before that's done and sort it out after?

@antrim
Copy link
Contributor

antrim commented Jun 16, 2023

This smells more like a validator rule than a highly specific reference change to me. This has been proposed as a rule in the validator:
MobilityData/gtfs-validator#163

This is a good opportunity to create more clarity about the relationship between the Spec, Best Practices, and the validator.

This is the existing language in the Spec, which already defines the correspondence between shapes.shape_dist_traveled and stop_times.shape_dist_traveled.

stop_times.txt

Field Name Type Presence Description
shape_dist_traveled Non-negative float Optional Actual distance traveled along the associated shape, from the first stop to the stop specified in this record. This field specifies how much of the shape to draw between any two stops during a trip. Must be in the same units used in shapes.txt. Values used for shape_dist_traveled must increase along with stop_sequence; they must not be used to show reverse travel along a route.
Example: If a bus travels a distance of 5.25 kilometers from the start of the shape to the stop,shape_dist_traveled=5.25.

@stevenmwhite
Copy link
Contributor Author

stevenmwhite commented Jun 16, 2023

The one thing the spec is missing as it describes these values (although I think it can/should be reasonably assumed) is that a trip cannot travel farther than its corresponding shape.

I originally noted this as an issue with the validator, and @emmambd mentioned that the rule isn't in the validator because it's not in the spec (Also see this comment from @isabelle-dr MobilityData/gtfs-validator#163 (comment))

Thus, my hope is to clarify the spec and make this explicit -- with the ultimate goal of ensuring the canonical validator validates against it.

@emmambd emmambd removed the stale label Jun 16, 2023
@emmambd
Copy link
Collaborator

emmambd commented Jun 21, 2023

Thanks for moving this discussion forward, @stevenmwhite and @antrim! At MobilityData, we plan to share some analytics to find examples of other feeds from the Mobility Database that aren't adhering to this rule to see if there are any valid cases / common occurrences. We expect to share our findings here next week.

@stevenmwhite Re: CLA, based on the governance,, you can proceed with a vote before you sign the CLA. You'll just need to sign it before the PR can be accepted and merged. Let us know if you need any help with the CLA issues.

@stevenmwhite
Copy link
Contributor Author

Thanks @emmambd — I'll hold off on the vote until you have an opportunity to review any other feeds in the database. Let us know what you find.

@emmambd
Copy link
Collaborator

emmambd commented Jul 1, 2023

At MobilityData, we did a review of feeds in the Mobility Database where the stop_times.shape_dist_traveled exceeds the max shapes.shape_dist_traveled and found 85 examples. Analysis with feed links here.

In 49 feeds, there was 1% or less difference between max shape_dist_traveled values. We haven't been able to spot a use case where the difference between these fields would in fact be necessary, but are interested in the prevalence of these very minor differences. Curious about others' thoughts.

@emmambd emmambd added the GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule label Jul 4, 2023
@stevenmwhite
Copy link
Contributor Author

Alright, I've finally returned from some leave of various types and I'm ready to call for a vote on this one.

I went through the feeds that @emmambd posted above and generally believe these to be software bugs. Two of the feeds were produced by our software and I've identified the issue that caused it. Otherwise, there's no logical way for a consumer to interpret a stop time distance that is beyond the distance of the shape.

As this isn't a new feature, I don't exactly believe it requires a consumer and producer to implement anything -- it's just a clarification of the spec language.

So, if the vote starts today, it will end at 23:59:59 UTC on July 27.

@skinkie
Copy link
Contributor

skinkie commented Jul 20, 2023

+1 (OpenGeo, as producer)

@emmambd emmambd added the Status: Voting Pull Requests where the advocate has called for a vote as described in the changes.md label Jul 20, 2023
@laurentg
Copy link

+1 Mecatran (producer & consumer)

@flocsy
Copy link
Contributor

flocsy commented Jul 23, 2023

+1

@gcamp
Copy link
Contributor

gcamp commented Jul 25, 2023

+1 Transit

@albanpeignier
Copy link

+1 enRoute

@isabelle-dr isabelle-dr mentioned this pull request Jul 27, 2023
4 tasks
@emmambd emmambd removed the Status: Voting Pull Requests where the advocate has called for a vote as described in the changes.md label Jul 28, 2023
@stevenmwhite
Copy link
Contributor Author

The vote has passed with none voting against and 5 voting in favor.

@emmambd
Copy link
Collaborator

emmambd commented Jul 28, 2023

Thanks to @stevenmwhite for being the PR advocate on this and to everyone who contributed to the discussion and voted!

You can track MobilityData's work on adding this to the Canonical Validator at MobilityData/gtfs-validator#163.

@emmambd emmambd merged commit c50c0b9 into google:master Jul 28, 2023
gcamp pushed a commit to TransitApp/transit that referenced this pull request Sep 25, 2023
…pe's maximum distance (google#380)

* Indicate max `shape_dist_traveled` for stop times

Added rule specifying that the `shape_dist_traveled` for a stop time must not exceed the total (maximum) distance for the shape of the given trip.

* Added trip reference

Clarified the shape for the related trip

* Minor update for Google CLA

Minor update to create commit with non-private email address.
@isabelle-dr isabelle-dr added the Change: Clarification Revisions of the current specification to improve understanding. label Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Change: Clarification Revisions of the current specification to improve understanding. GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants