Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add an argument for the cli (--date) that specifies the date to use during verification. #1628

Merged
merged 9 commits into from
Jan 17, 2024

Conversation

bradyhunsaker
Copy link
Contributor

Summary:

Resolves issue #1292.

By specifying a date and time with the --date argument, the validator will use that date instead of the current date. This enables replicating earlier runs and debugging rules that depend on time, like feed_expiration_date_30_days. Only the cli interface has this support added.

The main challenge here was how to handle this gracefully in the output.

A new field "dateForValidation" is added to the JSON report for all runs. I defer to project owners if they prefer a different name; it needs to contrast with the existing field "validatedAt".

The HTML report has a line added only if the validation date is different than the current date.

Some minor refactoring is included to make the new logic simpler, like changing from Date to ZonedDateTime in ValidationRunner.java.

The --date argument requires both a date and a time, because I could not find a way to keep the behavior and code simple when specifying a date only. Adding the minor burden of including the time for the few users of the argument seems better than the workarounds I considered.

A new argument is added to HtmlReportGenerator.generateReport, which is unfortunate given how infrequently this feature is expected to be used. I would have preferred to compare dates inside generateReport, but the run date is passed in a custom format, so it would be awkward to do a comparison inside the function.

Expected behavior:

In all interfaces and all runs, a new JSON field "dateForValidation" is added. In the common default case, this is generally slightly before the date and time in "validatedAt", because it is set earlier in execution.

In the cli, if the --date argument is given a valid date and time, that date and time will be used when checking rules that depend on time, such as checking for expired feeds.

If a date is specified and it is not today, then the HTML report will include a sentence stating the date and time used.

Please make sure these boxes are checked before submitting your pull request - thanks!

  • [X ] Run the unit tests with gradle test to make sure you didn't break anything
  • [X ] Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • [X ] Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

@emmambd emmambd linked an issue Jan 3, 2024 that may be closed by this pull request
@emmambd emmambd requested a review from jcpitre January 3, 2024 16:23
SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd 'at' HH:mm:ss z");
Date now = new Date(System.currentTimeMillis());
String date = formatter.format(now);
ZonedDateTime now = ZonedDateTime.now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think now will pretty much always be different from the date specified on the command line, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There have always been two times at play: the time used for validation and the time indicated in the report. In the current code (before this PR), only the report generation time is included in the output.

Note that this part of the code is a refactor. The existing code uses System.currentTimeMillis() while the new code uses ZonedDateTime.now(). In either case, this time included in the report is the time that the report is generated (after validation steps have been done).

The time used for validation is set at the start of processing if the new --date option isn't used. In that common case the two times differ by the processing time, which could be a fraction of a second or could be many minutes for large feeds (maybe even hours? I don't know how long the biggest feeds take).

That was already true and remains true. A difference now is that both times are included in the JSON report, and both are included in the HTML report if the date differs.

I'm open to changing any of that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification.
So config.zonedDateTime() is the time specified on the command line, right?
If it's the case, my point is that it's almost impossible to specify the exact same time as now, so is_different_date will pretty much always be true. In that case why check at all?
It's no big deal because the net result will that the it always prints the dateForValidation, which is OK IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood your original question. Date and time handling is confusing.

The computation of is_different_date converts both ZonedDateTime variables to LocalDate before comparing. It's looking to see if the date (not the time part) is different. In the common case that --date is not specified and the validation run completes on the same calendar date as it starts, is_different_date will be false and the HTML report will not have anything added. As you say, the date+time will always be different, but that isn't what is checked. My goal was to be conservative about changing the human-readable HTML output (although I'm open-minded).

Part of why this is confusing is that I called the argument --date, but you must specify a time as well, so it's really a date+time.

I have a suggestion to reduce the confusion: Refactor the code by replacing "currentDateTime" in ValidationContext with a LocalDate (no time). Currently the type is a ZonedDateTime, which is why I used that type in this PR and require that a time be specified along with a date. But I checked the uses of the ValidationContext, and the rules all convert to LocalDate when using currentDateTime anyway. (example:

LocalDate now = currentDateTime.getNow().toLocalDate();
)

I'll create a PR with that refactor so you see what I mean. It won't change the current functionality at all.
After that change (if you agree), then I'll change this PR so that only a date is specified by --date and hopefully it makes everything simpler and clearer. That will also resolve the issue about different timestamp formats; with the change the new one will just be a date. (Though I'll still follow up separately about making the existing one machine readable.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed that the comparison was based only on the date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just created PR #1636, a refactor to simplify and clarify the existing date handling. I'll wait to see if that is accepted and merged. If so, it will simplify this PR.

@@ -229,6 +229,7 @@ <h1>GTFS Schedule Validation Report</h1>
<span th:text="${config.gtfsSource}"></span><span
th:text="${config.countryCode.isUnknown()} ? '. No country code was provided.' : ', with the country code: ' + ${config.countryCode} + '.'"></span>
</br>
<span th:if="${is_different_date}" th:text="'The date and time used during validation was ' + ${config.zonedDateTime.toLocalDateTime()} + '.'"></span></p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format for this date is different from the one when the validator was run, e.g. 2024-01-03 at 17:14:01 EST vs 2020-01-02T12:34.
We should consider makeing the format the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I admit to noticing this and being a bit lazy. Largely it was because the existing format is a custom format. I like your suggestion later about changing that format, in which case it will be easy for them to use the same format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the date-for-validation is just a date while the run time is a date+time. So there's no mismatch. I would still like to move away from the custom format, but not in this PR since it no longer is an issue.

@@ -55,6 +56,7 @@ public JsonReportSummary(
this.validationReportName = config.validationReportFileName();
this.htmlReportName = config.htmlReportFileName();
this.countryCode = config.countryCode().getCountryCode();
this.dateForValidation = config.zonedDateTime().toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This date and the validatedAt date have different formats. They should be the same.
And this might be outside of the scope of this PR, but since they are destined to be read by a program, they should be easily parsable. ISO 8601 with offset might be more suited for the dates in the json file. e.g. 2024-01-03T17:14:01-05:00
This could be delegated to another issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that changing the existing date format to a non-custom one would be good. I didn't want to presume to do that, but it sounds like you're comfortable with it. It would make it easy to use matching date formats.

I'll make a separate PR with just that change, with the intent of submitting it before this one. My one concern is whether there is code out there that is parsing this custom date format of validatedAt. Are you concerned about that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I did not imply that you had to create another issue. We are already glad that you contributed to the project, and I would not want you to feel that you have to put more effort than what you intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I didn't interpret your comment that way. I like fixing things up when there's agreement to do so.

@emmambd
Copy link
Contributor

emmambd commented Jan 9, 2024

Thanks for this awesome work @bradyhunsaker! I'm tagging @atvaccaro, @derhuerst, and @wklumpen who expressed interest in this in #1292 in case they have thoughts as well.

@bradyhunsaker
Copy link
Contributor Author

I brought in the changes from PR #1636 and other changes in the master branch. I'll update here when I have the code ready for a re-review.

@bradyhunsaker
Copy link
Contributor Author

OK, ready for review.
I committed changes so that it now specifies just a date (no time) and uses the dateForValidation name.

Copy link
Contributor

@jcpitre jcpitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Thanks

@wklumpen
Copy link

Amazing work! Thanks very much for the work. Now I don't have to just ignore the "not within 30 days" warning :)

@jcpitre jcpitre merged commit 86f9484 into MobilityData:master Jan 17, 2024
333 checks passed
@bradyhunsaker bradyhunsaker deleted the override-date branch January 19, 2024 00:26
description =
"Date to simulate when validating, in ISO_LOCAL_DATE format like "
+ "'2001-01-30'. By default, the current date is used. "
+ "This option enables debugging rules like feed expiration.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradyhunsaker It's a bit late, but I was wondering about this sentence. I don't see where these debugging rules are enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear. I meant that the main purpose of this option is debugging. For example, it could be used to debug an error with a rule like feed expiration.

There are not special debugging rules associated with it.
I think the use of the word "enables" was a bad choice. Feel free to remove or edit the description.

@jcpitre jcpitre mentioned this pull request Jan 24, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow the validator to accept a "time of validation" argument
4 participants