Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs for curate format-dates #1653

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 15 additions & 8 deletions augur/curate/format_dates.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
"""
Format date fields to ISO 8601 dates (YYYY-MM-DD), where incomplete dates
are masked with 'XX' (e.g. 2023 -> 2023-XX-XX).
Format date fields to ISO 8601 dates (YYYY-MM-DD).

If the provided ``--expected-date-formats`` represent incomplete dates then
the incomplete dates are masked with 'XX'. For example, providing
``%Y`` will allow year only dates to be formatted as ``2023-XX-XX``.
"""
import re
from datetime import datetime
Expand Down Expand Up @@ -30,14 +33,14 @@ def register_parser(parent_subparsers):
required = parser.add_argument_group(title="REQUIRED")
required.add_argument("--date-fields", nargs="+", action="extend",
help="List of date field names in the record that need to be standardized.")
required.add_argument("--expected-date-formats", nargs="+", action="extend",

optional = parser.add_argument_group(title="OPTIONAL")
optional.add_argument("--expected-date-formats", nargs="+", action="extend",
default=DEFAULT_EXPECTED_DATE_FORMATS,
help="Expected date formats that are currently in the provided date fields, " +
"defined by standard format codes as listed at " +
"https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes. " +
"If a date string matches multiple formats, it will be parsed as the first matched format in the provided order.")

optional = parser.add_argument_group(title="OPTIONAL")
optional.add_argument("--failure-reporting",
type=DataErrorMethod.argtype,
choices=list(DataErrorMethod),
Expand Down Expand Up @@ -181,6 +184,10 @@ def format_date(date_string, expected_formats):
def run(args, records):
failures = []
failure_reporting = args.failure_reporting
failure_suggestion = (
f"\nCurrent expected date formats are {args.expected_date_formats!r}. " +
"This can be updated with --expected-date-formats."
)
for index, record in enumerate(records):
record = record.copy()
record_id = index
Expand All @@ -203,7 +210,7 @@ def run(args, records):

failure_message = f"Unable to format date string {date_string!r} in field {field!r} of record {record_id!r}."
if failure_reporting is DataErrorMethod.ERROR_FIRST:
raise AugurError(failure_message)
raise AugurError(failure_message + failure_suggestion)

if failure_reporting is DataErrorMethod.WARN:
print_err(f"WARNING: {failure_message}")
Expand All @@ -221,10 +228,10 @@ def run(args, records):
'\n'.join(map(repr, failures))
)
if failure_reporting is DataErrorMethod.ERROR_ALL:
raise AugurError(failure_message)
raise AugurError(failure_message + failure_suggestion)

elif failure_reporting is DataErrorMethod.WARN:
print_err(f"WARNING: {failure_message}")
print_err(f"WARNING: {failure_message}" + failure_suggestion)

else:
raise ValueError(f"Encountered unhandled failure reporting method: {failure_reporting!r}")
3 changes: 3 additions & 0 deletions tests/functional/curate/cram/format-dates/failure-reporting.t
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ This is expected to fail with an error, so redirecting stdout since we don't car
> --date-fields "date" "collectionDate" "releaseDate" "updateDate" \
> --expected-date-formats "%Y" "%Y-%m-%dT%H:%M:%SZ" 1> /dev/null
ERROR: Unable to format date string '2020-01' in field 'collectionDate' of record 0.
Current expected date formats are ['%Y-%m-%d', '%Y-%m-XX', '%Y-XX-XX', 'XXXX-XX-XX', '%Y', '%Y-%m-%dT%H:%M:%SZ']. This can be updated with --expected-date-formats.
[2]

Test output with unmatched expected date formats with `ERROR_ALL` failure reporting.
Expand All @@ -29,6 +30,7 @@ This is expected to fail with an error, so redirecting stdout since we don't car
ERROR: Unable to format dates for the following (record, field, date string):
(0, 'collectionDate', '2020-01')
(0, 'releaseDate', '2020-01')
Current expected date formats are ['%Y-%m-%d', '%Y-%m-XX', '%Y-XX-XX', 'XXXX-XX-XX', '%Y', '%Y-%m-%dT%H:%M:%SZ']. This can be updated with --expected-date-formats.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[aside] Oh - this test output really helped me understand the --expected-date-formats argument - it doesn't overwrite the defaults, it extends them. That wasn't really clear from the help / docs. Do you know if there is there an established pattern for conveying whether an argument extends vs replaces the defaults?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, now that you mention it, I don't think this is clear in any of the help / docs.

The default behavior for the extend action in argparse is to extend the defaults. We have a custom ExtendOverwriteDefault action that overwrites the defaults instead of extending them, but this custom action does not change the output of the CLI help docs.

Comparing --metadata-delimiters which uses ExtendOverwriteDefault and --expected-date-formats which uses the default extend action. Neither the help nor the docs differentiates the extend vs overwrite behavior:

--metadata-delimiters METADATA_DELIMITERS [METADATA_DELIMITERS ...]
                        Delimiters to accept when reading a plain text metadata file. Only one delimiter will be inferred. (default: (',', '\t'))
--expected-date-formats EXPECTED_DATE_FORMATS [EXPECTED_DATE_FORMATS ...]
                        Expected date formats that are currently in the provided date fields, defined by standard format codes as listed at
                        https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes. If a date string matches multiple formats, it will be parsed as the
                        first matched format in the provided order. (default: ['%Y-%m-%d', '%Y-%m-XX', '%Y-XX-XX', 'XXXX-XX-XX'])

Screenshot 2024-10-17 at 12 17 10 PM
Screenshot 2024-10-17 at 12 17 21 PM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[--metadata-delimiters cf. --expected-date-formats] Neither the help nor the docs differentiates the extend vs overwrite behavior

Perfect example right there!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking separately in #1654

[2]

Test output with unmatched expected date formats while warning on failures.
Expand All @@ -44,6 +46,7 @@ This is expected to print warnings for failures and return the masked date strin
WARNING: Unable to format dates for the following (record, field, date string):
(0, 'collectionDate', '2020-01')
(0, 'releaseDate', '2020-01')
Current expected date formats are ['%Y-%m-%d', '%Y-%m-XX', '%Y-XX-XX', 'XXXX-XX-XX', '%Y', '%Y-%m-%dT%H:%M:%SZ']. This can be updated with --expected-date-formats.
{"record": 1, "date": "2020-XX-XX", "collectionDate": "XXXX-XX-XX", "releaseDate": "XXXX-XX-XX", "updateDate": "2020-07-18"}

Test output with unmatched expected date formats while silencing failures.
Expand Down
Loading