-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add an argument for the cli (--date) that specifies the date to use during verification. #1628
Conversation
cli/src/main/java/org/mobilitydata/gtfsvalidator/cli/Arguments.java
Outdated
Show resolved
Hide resolved
SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd 'at' HH:mm:ss z"); | ||
Date now = new Date(System.currentTimeMillis()); | ||
String date = formatter.format(now); | ||
ZonedDateTime now = ZonedDateTime.now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think now
will pretty much always be different from the date specified on the command line, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. There have always been two times at play: the time used for validation and the time indicated in the report. In the current code (before this PR), only the report generation time is included in the output.
Note that this part of the code is a refactor. The existing code uses System.currentTimeMillis() while the new code uses ZonedDateTime.now(). In either case, this time included in the report is the time that the report is generated (after validation steps have been done).
The time used for validation is set at the start of processing if the new --date option isn't used. In that common case the two times differ by the processing time, which could be a fraction of a second or could be many minutes for large feeds (maybe even hours? I don't know how long the biggest feeds take).
That was already true and remains true. A difference now is that both times are included in the JSON report, and both are included in the HTML report if the date differs.
I'm open to changing any of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification.
So config.zonedDateTime()
is the time specified on the command line, right?
If it's the case, my point is that it's almost impossible to specify the exact same time as now
, so is_different_date
will pretty much always be true. In that case why check at all?
It's no big deal because the net result will that the it always prints the dateForValidation, which is OK IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I misunderstood your original question. Date and time handling is confusing.
The computation of is_different_date converts both ZonedDateTime variables to LocalDate before comparing. It's looking to see if the date (not the time part) is different. In the common case that --date is not specified and the validation run completes on the same calendar date as it starts, is_different_date will be false and the HTML report will not have anything added. As you say, the date+time will always be different, but that isn't what is checked. My goal was to be conservative about changing the human-readable HTML output (although I'm open-minded).
Part of why this is confusing is that I called the argument --date, but you must specify a time as well, so it's really a date+time.
I have a suggestion to reduce the confusion: Refactor the code by replacing "currentDateTime" in ValidationContext with a LocalDate (no time). Currently the type is a ZonedDateTime, which is why I used that type in this PR and require that a time be specified along with a date. But I checked the uses of the ValidationContext, and the rules all convert to LocalDate when using currentDateTime anyway. (example:
gtfs-validator/main/src/main/java/org/mobilitydata/gtfsvalidator/validator/DateTripsValidator.java
Line 76 in 3d5ef6f
LocalDate now = currentDateTime.getNow().toLocalDate(); |
I'll create a PR with that refactor so you see what I mean. It won't change the current functionality at all.
After that change (if you agree), then I'll change this PR so that only a date is specified by --date and hopefully it makes everything simpler and clearer. That will also resolve the issue about different timestamp formats; with the change the new one will just be a date. (Though I'll still follow up separately about making the existing one machine readable.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed that the comparison was based only on the date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just created PR #1636, a refactor to simplify and clarify the existing date handling. I'll wait to see if that is accepted and merged. If so, it will simplify this PR.
main/src/main/resources/report.html
Outdated
@@ -229,6 +229,7 @@ <h1>GTFS Schedule Validation Report</h1> | |||
<span th:text="${config.gtfsSource}"></span><span | |||
th:text="${config.countryCode.isUnknown()} ? '. No country code was provided.' : ', with the country code: ' + ${config.countryCode} + '.'"></span> | |||
</br> | |||
<span th:if="${is_different_date}" th:text="'The date and time used during validation was ' + ${config.zonedDateTime.toLocalDateTime()} + '.'"></span></p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format for this date is different from the one when the validator was run, e.g. 2024-01-03 at 17:14:01 EST
vs 2020-01-02T12:34
.
We should consider makeing the format the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I admit to noticing this and being a bit lazy. Largely it was because the existing format is a custom format. I like your suggestion later about changing that format, in which case it will be easy for them to use the same format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the date-for-validation is just a date while the run time is a date+time. So there's no mismatch. I would still like to move away from the custom format, but not in this PR since it no longer is an issue.
main/src/main/java/org/mobilitydata/gtfsvalidator/runner/ValidationRunnerConfig.java
Outdated
Show resolved
Hide resolved
@@ -55,6 +56,7 @@ public JsonReportSummary( | |||
this.validationReportName = config.validationReportFileName(); | |||
this.htmlReportName = config.htmlReportFileName(); | |||
this.countryCode = config.countryCode().getCountryCode(); | |||
this.dateForValidation = config.zonedDateTime().toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This date and the validatedAt date have different formats. They should be the same.
And this might be outside of the scope of this PR, but since they are destined to be read by a program, they should be easily parsable. ISO 8601 with offset might be more suited for the dates in the json file. e.g. 2024-01-03T17:14:01-05:00
This could be delegated to another issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that changing the existing date format to a non-custom one would be good. I didn't want to presume to do that, but it sounds like you're comfortable with it. It would make it easy to use matching date formats.
I'll make a separate PR with just that change, with the intent of submitting it before this one. My one concern is whether there is code out there that is parsing this custom date format of validatedAt. Are you concerned about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I did not imply that you had to create another issue. We are already glad that you contributed to the project, and I would not want you to feel that you have to put more effort than what you intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. I didn't interpret your comment that way. I like fixing things up when there's agreement to do so.
Thanks for this awesome work @bradyhunsaker! I'm tagging @atvaccaro, @derhuerst, and @wklumpen who expressed interest in this in #1292 in case they have thoughts as well. |
I brought in the changes from PR #1636 and other changes in the master branch. I'll update here when I have the code ready for a re-review. |
OK, ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! Thanks
Amazing work! Thanks very much for the work. Now I don't have to just ignore the "not within 30 days" warning :) |
description = | ||
"Date to simulate when validating, in ISO_LOCAL_DATE format like " | ||
+ "'2001-01-30'. By default, the current date is used. " | ||
+ "This option enables debugging rules like feed expiration.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bradyhunsaker It's a bit late, but I was wondering about this sentence. I don't see where these debugging rules are enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear. I meant that the main purpose of this option is debugging. For example, it could be used to debug an error with a rule like feed expiration.
There are not special debugging rules associated with it.
I think the use of the word "enables" was a bad choice. Feel free to remove or edit the description.
Summary:
Resolves issue #1292.
By specifying a date and time with the --date argument, the validator will use that date instead of the current date. This enables replicating earlier runs and debugging rules that depend on time, like feed_expiration_date_30_days. Only the cli interface has this support added.
The main challenge here was how to handle this gracefully in the output.
A new field "dateForValidation" is added to the JSON report for all runs. I defer to project owners if they prefer a different name; it needs to contrast with the existing field "validatedAt".
The HTML report has a line added only if the validation date is different than the current date.
Some minor refactoring is included to make the new logic simpler, like changing from Date to ZonedDateTime in ValidationRunner.java.
The --date argument requires both a date and a time, because I could not find a way to keep the behavior and code simple when specifying a date only. Adding the minor burden of including the time for the few users of the argument seems better than the workarounds I considered.
A new argument is added to HtmlReportGenerator.generateReport, which is unfortunate given how infrequently this feature is expected to be used. I would have preferred to compare dates inside generateReport, but the run date is passed in a custom format, so it would be awkward to do a comparison inside the function.
Expected behavior:
In all interfaces and all runs, a new JSON field "dateForValidation" is added. In the common default case, this is generally slightly before the date and time in "validatedAt", because it is set earlier in execution.
In the cli, if the --date argument is given a valid date and time, that date and time will be used when checking rules that depend on time, such as checking for expired feeds.
If a date is specified and it is not today, then the HTML report will include a sentence stating the date and time used.
Please make sure these boxes are checked before submitting your pull request - thanks!
gradle test
to make sure you didn't break anything