Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider exact policy around DateTimeError in datetime formatting #4336

Open
sffc opened this issue Nov 20, 2023 · 23 comments
Open

Consider exact policy around DateTimeError in datetime formatting #4336

sffc opened this issue Nov 20, 2023 · 23 comments
Assignees
Labels
C-datetime Component: datetime, calendars, time zones

Comments

@sffc
Copy link
Member

sffc commented Nov 20, 2023

Currently there are 3 places where errors can be returned in datetime formatting:

  1. Constructor: DateTimeFormatter::try_new
    • Example: locale missing
  2. Format function: DateTimeFormatter::format
    • Example: mismatched calendar
  3. Write function: FormattedDateTime::write_to
    • Example: data for pattern not loaded

Note that case 3 is a core::fmt::Error whereas cases 1 and 2 are DateTimeError. Any errors that occur during write_to are logged with log::warn! and then flattened to a core::fmt::Error.

Currently case 3 does not happen in normal public API usage of the icu::datetime crate. However, it is now possible to reach this case in DateTimePatternInterpolator (#3347).

If we moved this error to phase 2, we may end up slowing down the path for all users since we need to walk the whole pattern in order to figure out if the data is missing.

We should decide on this before shipping DateTimePatternInterpolator.

@sffc sffc added discuss Discuss at a future ICU4X-SC meeting C-datetime Component: datetime, calendars, time zones labels Nov 20, 2023
@sffc sffc added this to the 1.5 Blocking ⟨P1⟩ milestone Nov 20, 2023
@sffc
Copy link
Member Author

sffc commented Nov 30, 2023

Want to discuss with:

@sffc sffc added the discuss-triaged The stakeholders for this issue have been identified and it can be discussed out-of-band label Nov 30, 2023
@sffc
Copy link
Member Author

sffc commented Dec 23, 2023

It should be noted that the ::format function has an error only in DateTimeFormatter, not in TypedDateTimeFormatter, and the only error it returns is a mismatched calendar. So, any additional pre-processing we do would result in changing the signature of the ::format functions on TypedDateTimeFormatter, which we could do in 2.0. However, it might be nice to avoid this.

I'd like to proceed with the following solution:

  1. Change the experimental DateTimeNames::with_pattern to two functions:
    • DateTimeNames::try_with_pattern, which checks that the names are loaded correctly for the pattern
    • DateTimeNames::with_pattern_unchecked, which has the current behavior (the function is unchecked but not unsafe)
    • And keep the existing load_for_pattern and include_for_pattern the way they are.
  2. If a DateTimeError occurs during writing, which shall be possible only if an "unchecked" function was called without its invariants upheld, do the following:
    • debug_assert!(false) with the error
    • Log the error using the internal logging utility (it currently does this)
    • Only then fall back to core::fmt::Error, which only happens without debug assertions, and which is silent only if logging is disabled
  3. Document that the only error in DateTimeFormatter::format is a mismatched calendar error. We could even change the error type to be specific for that case, but I don't se a good reason to do that.

OK?

@sffc sffc added needs-approval One or more stakeholders need to approve proposal and removed discuss Discuss at a future ICU4X-SC meeting discuss-triaged The stakeholders for this issue have been identified and it can be discussed out-of-band labels Dec 23, 2023
@Manishearth
Copy link
Member

sgtm

@robertbastian
Copy link
Member

robertbastian commented Jan 4, 2024

Using core::fmt::Error means code like formatted_date_time.to_string() panics, so I don't think we should ever use it. We should GIGO, in debug mode we debug-assert, but in release mode we return Ok and write something like <missing xy-data>.

(edit: this is actually a requirement of the trait)

Document that the only error in DateTimeFormatter::format is a mismatched calendar error. We could even change the error type to be specific for that case, but I don't se a good reason to do that.

This is why I don't like the error enums. We have so many methods that can only error with a subset of variants, but the client always has to check all of them. Documentation is not a strong enough guarantee imho.

@sffc
Copy link
Member Author

sffc commented Jan 4, 2024

Using core::fmt::Error means code like formatted_date_time.to_string() panics, so I don't think we should ever use it. We should GIGO, in debug mode we debug-assert, but in release mode we return Ok and write something like <missing xy-data>.

Ok how about writing the string "(icu4x error)"? I don't want to carry around the Display impl for the Error type in release mode, and you get the whole error with debug assertions enabled.

Document that the only error in DateTimeFormatter::format is a mismatched calendar error. We could even change the error type to be specific for that case, but I don't se a good reason to do that.

This is why I don't like the error enums. We have so many methods that can only error with a subset of variants, but the client always has to check all of them. Documentation is not a strong enough guarantee imho.

I mostly agree. Not proposing to change anything right now.

@robertbastian
Copy link
Member

We should use a different garbage string for each code location. (icu4x error) is not actionable.

@sffc
Copy link
Member Author

sffc commented Jan 5, 2024

The formatters share the same code path.

Neo:

Neo Pattern: https://github.com/unicode-org/icu4x/blob/978ed24845299cad5a58a928ef85b7b2e8928ed8/components/datetime/src/format/neo.rs#L1178C6-L1178C6

Old:

The best we could do without an enum would be something like (error in icu::datetime::TypedDateTimeFormatter)

@robertbastian
Copy link
Member

The GIGO doesn't have to happen at the top level that you linked. write_pattern should be infallible, and then write_field would do garbage out instead of returning errors: https://github.com/unicode-org/icu4x/blob/main/components/datetime/src/format/datetime.rs#L239. This lets us use different messages for different fields.

@sffc
Copy link
Member Author

sffc commented Jan 5, 2024

Hmm, replumbing the formating code is a fairly big change

My main concern with the Display impl is code size. We could avoid this by using a different error struct internally that carries a static str with the additional context

@sffc
Copy link
Member Author

sffc commented Jan 6, 2024

Or are you proposing that we produce outputs such as "(icu4x error: weekday names not loaded), 6 (icu4x error: month names not loaded) 2024" (instead of "Saturday, 6 January 2024")

@robertbastian
Copy link
Member

Yes, that

@zbraniecki
Copy link
Member

IMO "{?}, 6 {?} 2024" is more user friendly than "(icu4x error: weekday names not loaded), 6 (icu4x error: month names not loaded) 2024"

@robertbastian
Copy link
Member

What are your criteria for "user friendly"? {?} certainly looks nicer, but this is an error, it shouldn't look nice. We also want to communicate which names are not loaded; a bunch of {?} are going to be harder to fix, making it less user friendly.

@zbraniecki
Copy link
Member

I think the user persona in your mind is a developer, and in my mind, end user of a product that utilizes ICU4X. Both personas are important and they should receive output adapted to their needs.

but this is an error, it shouldn't look nice.

The developer seeing an erroneous output should be able to act on it, I agree. Your sample serves that well.

The end user of a web browser cannot act on the error, and should see broken output in context of their UI. I believe my sample serves that better.
The end user may also get a text error in their debug console, or some telemetry may send error data to developer for debugging, but the UI impact should be low.

@robertbastian
Copy link
Member

The use case for this message is DateTimeNames::with_pattern_unchecked. The unchecked part clearly communicates to the developer that they have to be careful with this and verify the output before any user ever sees it. This is why I'm primarily thinking about communicating to the developer instead of an end-user.

@sffc
Copy link
Member Author

sffc commented Jan 9, 2024

I realized that this is basically the same problem space as "GMT+?" (#2245)

Along those lines, we could make up pithy error strings for each field that are both user friendly and searchable. For the month field:

  • "M?"
  • "month?"
  • "ICU4X:M?"

Wdyt?

@Manishearth
Copy link
Member

Manishearth commented Jan 11, 2024

Discussed in meeting to a partial conclusion

  • @robertbastian is it guaranteed that we have a number when we don't have patterns/names/whatever?

  • @Manishearth only cases where DTF might not have a value from the date is cyclic/related iso. totally fine to ? in those cases, like time zones

  • @robertbastian - so the issue is missing symbol names, so if the data is there we can find a neutral way to display it

  • @robertbastian: original idea was ¿month 01? 6, 2024"

  • @robertbastian: shorter is "¿M01? 6, 2024". less clear it's broken

  • @zbraniecki: Do not want prod users seeing ICU4X error. it's meaningless. Ok with ¿? or brackets or whatever.

Ideas:
"(M01) 6, 2024"
"{M01} 6, 2024"
"[M01] 6, 2024"
" 6, 2024"
"|M01| 6, 2024"
"💥M01💥 6, 2024"

  • @Manishearth: parentheses are used in normal patterns so let's not use them to signal errors
  • @robertbastian: could do:
    • "|M01| 6, 2024" in prod
    • "|ICU4X MISSING MONTH PATTERN M01| 6, 2024" in debug
  • Can cause problems in tests if people only test debug or release mode.

Agreed (amongst @zbraniecki, @robertbastian, @Manishearth):

- No scary strings in release/prod mode.
- Fine to have scary string in debug mode
- Shane can pick whatever brackets he wants

@sffc is this okay with you (and if so, please remove the discuss label)

@sffc
Copy link
Member Author

sffc commented Jan 16, 2024

I'd prefer to not include the month number because the error-state code path should be as minimal as possible; formatting "M01" is more complicated than "M" (even if only a little bit more complicated) and I want to start from the position of being as stripped-down as possible.

I am happy with 💥M or 〚M〛 or ⦃M⦄ or «M» or {M}

@zbraniecki How strongly would you like the month number to be rendered?

@sffc
Copy link
Member Author

sffc commented Jan 22, 2024

Discussion with @zbraniecki:

  • MonthCode is available to the formatting function as a TinyStr so we can just echo it. Same with era code. «M01», «gregory»
  • For numeric fields if the FixedDecimalFormatter is not loaded, load the FixedDecimalFormatter in no-data mode, but still indicate the error using brackets
  • For weekday: «E3»
  • For day period: «am» «pm»

@robertbastian
Copy link
Member

For numeric fields if the FixedDecimalFormatter is not loaded, load the FixedDecimalFormatter in no-data mode, but still indicate the error using brackets

Why not use Writeable for u8?

Also I just realised

  • Fine to have scary string in debug mode

We're debug_assert!ing, so debug mode will panic anyway.

@sffc
Copy link
Member Author

sffc commented Feb 8, 2024

For numeric fields if the FixedDecimalFormatter is not loaded, load the FixedDecimalFormatter in no-data mode, but still indicate the error using brackets

Why not use Writeable for u8?

  1. This is internal code and we can change it at any time; it does not impact the observable behavior
  2. Fewer branches / code paths; we just change the Option<FixedDecimalFormatter> to FixedDecimalFormatter that defaults to one that performs the desired error-like output
  3. Although small, impl Writeable for u8 has some code size, and it is not an impl that is necessarily going to be present in the binary (https://github.com/unicode-org/icu4x/blob/main/utils/writeable/src/impls.rs)
  4. Why not? Less code, fewer if statements, better maintainability.

@robertbastian robertbastian self-assigned this Mar 20, 2024
robertbastian added a commit that referenced this issue Apr 29, 2024
#4336

What this PR does
- Stops converting datetime errors into fmt errors
- Use fallback values to continue writing when an error occurs
- Replaces `Writeable` by `TryWriteable` in neo
- Uses lossy mode in non-neo Writeable implementations

What this PR doesn't do
- Use correct gigo values. For now it writes `<ICU4X ERROR>` everywhere.
A domain expert should replace these by sensible values
- Use a narrower error type than `DatetimeError` for `TryWriteable`
robertbastian added a commit that referenced this issue May 7, 2024
@robertbastian
Copy link
Member

The remaining work is semver breaking, so moving this to 2.0.

@sffc sffc removed the needs-approval One or more stakeholders need to approve proposal label Jul 30, 2024
@sffc
Copy link
Member Author

sffc commented Sep 17, 2024

Neo uses the new error types. @robertbastian to investigate if this is done.

@sffc sffc removed their assignment Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-datetime Component: datetime, calendars, time zones
Projects
Status: Small breakage (defer to end)
Development

No branches or pull requests

4 participants