Description
openedon Nov 7, 2024
Describe the bug
At busmaps.com we are experiencing an issue with the Mobility Data Validator v6 when validating GTFS feeds that were previously validated successfully in version 5.0.1. Specifically, the latest version fails to process certain files with route names containing specific characters, which were acceptable in v5.0.1.
The GTFS file in question, located at GTFS File URL, includes route names such as "Funo - Z.I. Còde Fabbri", containing non-standard symbols. In v6, this causes the validator to label certain rows as unparseable, which subsequently blocks all validation rules from executing for affected files, including routes validation. This behavior differs from v5.0.1, which processed these routes without issues, allowing complete validation.
Validation Log Summary:
agency.txt – 1 row
calendar_dates.txt – 7,694 rows
feed_info.txt – 1 row
routes.txt – UNPARSABLE_ROWS
shapes.txt – 440,628 rows
stop_times.txt – 663,387 rows
stops.txt – 6,578 rows
trips.txt – 21,473 rows
Additional Information:
We validate over 3,000 GTFS feeds, and consistency across versions is critical for our use case. This change in behavior has introduced significant challenges in managing our validation workflow.
Steps/Code to Reproduce
Steps to Reproduce:
-
Download the GTFS file from the provided URL.
-
Run the Mobility Data Validator on an Ubuntu system using the minimal command outlined in the documentation:
java -jar gtfs-validator-5.0.1-cli.jar -i {path to the GTFS file} -o {name of the output directory that will be created}
java -jar gtfs-validator-6.0.0-cli.jar -i {path to the GTFS file} -o {name of the output directory that will be created}
-
Observe the failure in
routes.txt
and the unparseable row errors.
Expected Results
The validator should successfully parse and validate the GTFS file, including all rows in routes.txt without marking them as unparseable, even if route names contain non-standard characters. The validation should complete with a full report of any detected issues across all files, as it did in version 5.0.1. If any encoding-related issues are detected in routes.txt, they should be logged as warnings rather than errors that block further rule execution.
Actual Results
When running the validation in version 6, the routes.txt file is marked as containing "UNPARSABLE_ROWS," preventing further validation of its contents. This differs from version 5.0.1, where the file was fully validated even with non-standard characters in route names. As a result, validation rules for routes are not executed, and a complete validation report is not generated.
Screenshots
No response
Files used
No response
Validator version
6.0.0
Operating system
Linux Ubuntu 22.04
Java version
openjdk version "17.0.12" 2024-07-16
Additional notes
No response
Metadata
Assignees
Type
Projects
Status
Requires investigation