-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should PartialEq<str> for Locale be strict or lenient? #1709
Comments
I believe for 1.0 we should have a mental model of how our API communicate different strictness levels of comparison and what is the role of traits such as For example, we could say that We should also consider incomplete matching - The |
To recap - I don't think we need to implement all of them, but I think before 1.0 we should have a good directions on which API compares how so that as the needs arise we can implement the particular methods/traits consistently in shape that we defined and set prior to 1.0. |
A few considerations:
I tend to have a fairly strong preference that the Across types, my preference is not as strong, but I tend to prefer strict comparison where possible. I think we should offer lenient comparison, but I don't think it's what users should get by default if they don't know better. In most API design cases I've come across, "strict by default" usually ends up being the chosen option. |
When you say Across types, I think in particular for comparing locale against a string, I think we should be lenient (diverge from your position) and On the other axis (how to indicate that |
What I mean is a comparison that satisfies the properties of a "total order", which implies case sensitive, yes. How would you feel about just not implementing PartialEq across types and then having two methods users can choose from to make the choice between strict and lenient more explicit? Here are four reasons I tend to prefer strict comparison by default:
|
I feel that if there's no "canonical" expectation, that makes sense - we're effectively kicking developers out of their comfort zone and asking them to explicitly decide. Below, I'll argue against the assumption that we don't know what is the canonical expectation tho :)
I think I disagree. let loc_map: HashMap<Locale, SomeData>;
assert!(loc_map.binary_search("en_US").is_err()); I'd argue that if someone asks for it against a string that they expect If they really need to, they could check that their string is not canonical locale and reject it prior to matching.
Agree. The strict vs loose matching can be seen as performance question. I'd argue that by default we should be lenient on input, and allow for "strict_cmp" for cases that need it.
I don't think your examples are plausible.
I think it's a subjective argument. For one person it may be surprising to see I think if we embrace "lenient on input, strict on output" then we should be lenient here. If we reject it, we should be strict and be very clear in docs that any string used for comparison tests against any component of a locale has to be canonicalized before for the matching to work. We could also do any of the:
and then separate matrix for the meaning of missing components ( |
Rust binary_search is defined only for Is binary_search well-defined on non-total-order impls? If so, then why does the standard library require
Ok. My position, weakly held, is that the thing that is better for the ICU4X value proposition (performance, memory usage, or in this case code size) should be the default behavior. One could make the argument that lenient comparison is better for the "i18n correctness" value proposition, an argument with which I disagree, since we should be expressly encouraging people to use BCP-47 language tags, rather than being lenient by default and allowing the old format to continue to exist and be supported in the wild.
Does UTS-35 define exactly what sets of strings are considered valid but non-canonical? I'd like to draw a corollary with the related issue of
My position here is that if users require lenient behavior, they will discover fairly quickly that the default impl is not lenient through testing. However, if the users require strict behavior, they may never find out that they are using the less-efficient lenient method, since the lenient method works in all the cases that the strict method works. It is therefore less likely to cause surprise if strict is default. |
I see the problem in the fact that they let it be "implementation-defined". We would be more strict allowing only for case-insensitive and
I struggle to imagine a case where a user would like the behavior of matching a Locale to a stringified version of it to be strict. |
If the user has a list of strings that they know are definitely BCP-47 canonical, and they don't want to carry the extra code size / performance cost of "canonicalization on the fly". I have such a case in the data provider. |
If we're clear that these are the only exceptions, then this satisfies argument 3. Argument 2 is weakened, except for my side-note about the negative impact on the ecosystem of allowing the old stuff to propogate. Arguments 1 and 4 still stand. |
I wasn't explicit enough. I struggle to find a use case where a consumer of our library who is not very familiar with the nuances of the canonicalization and wants to "just use" I imagine a special-case consumer who is very performance/code-size conscious and they would want to use
My responses still stand as well. Anyhoo. I'm 0.2 on apache scale and would prefer not to invest much more time in bikeshedding since we seem to have our arguments laid out and I doubt we are in a position to convince one another. Maybe we can discuss it in icu4x bi-weekly, vote and move on. |
Marking this as discuss-priority so we can can come to a decision on this at the next meeting. |
Discussion:
Proposal: Do not have |
Naming for the two functions:
New names:
LGTM: @QnnOkabayashi @Manishearth @robertbastian @nordzilla @zbraniecki In the future we could add:
|
Should we also remove |
I believe so, yes. I expect that the use cases in We should use strict comparison for data coming from the DataProvider, because we expect the data to have been pre-normalized in datagen. |
|
Oh right, it uses TinyStr because the subtags might not all be the same length. You can get the |
Background: #1704
The text was updated successfully, but these errors were encountered: