Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derive Eq and Hash in datetime::options #4230

Merged
merged 2 commits into from
Oct 30, 2023

Conversation

g2p
Copy link
Contributor

@g2p g2p commented Oct 26, 2023

My goal was to use options::length::Bag as a hashtable key, to cache formatters by; and to work with intl-memoizer, which originates in fluent-rs.

The first commit adds just what I need, the second adds the same derives for the experimental formatter options.

The annotated types are all trivial, dataless enums.

These are simple dataless enums.
We just make sure to derive Hash and Eq.
These are simple dataless enums.
We just make sure to derive Hash and Eq.
@CLAassistant
Copy link

CLAassistant commented Oct 26, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change itself seems fine to me but I'd like @zbraniecki to approve. (Also, we can't land this until after you click through the CLA assistant.)

However I'm not sure if I'm motivated by the use case. ICU4X is designed from the ground up to reduce the need for caches of formatter objects. Unlike in ICU4C, an ICU4X DateTimeFormatter constructor is cheap (though not completely free). One use case where it might make sense to use caches is when loading data from a postcard provider, and we have a tutorial for that.

In other words, don't create a cache just because you think you need one based on experience using ICU4C; do it only if you have actual numbers. I would also be interested in seeing those numbers.

@g2p
Copy link
Contributor Author

g2p commented Oct 28, 2023

So, I've been creating the cache by following fluent-bundle pushing me in that direction (it's also by @zbraniecki by the way): I can only access a message's locale context by deriving a trait that gets called by a memoizer.
I have no benchmarks showing caching is needed at this point.

Though, to go back to icu_datetime, the DateTimeFormatter docs say this: "For that reason, one should think of the process of formatting a date in two steps - first, a computational heavy construction of DateTimeFormatter, and then fast formatting of DateTime data using the instance."

If I don't cache, I will create a DateTimeFormatter for each date I format; but I'm using the default compiled_data approach so maybe that is acceptable.

(I don't mean to push for this small change if there are good reasons not to. At the moment I've just reimplemented Hash outside using mem::discriminant.)

@sffc
Copy link
Member

sffc commented Oct 29, 2023

I filed #4237 to discuss the verbiage of the DateTimeFormat docs.

@zbraniecki
Copy link
Member

I'm also hopeful that ICU4X does away with this motivator, but the traits seem reasonable to have on those structs, so it's an r+ from me.

As for Fluent - it has been designed around ICU4C model, and ICU4X may motivate us to rethink how to construct formatters without memoization.

I'm curious if you can measure the difference between constructing new DTF in each memoization call vs actually memoizing it.

g2p added a commit to g2p/fluent-datetime that referenced this pull request Oct 30, 2023
See unicode-org/icu4x#4230 (review)

Results look like:
instanciate every time  time:   [9.5168 µs 9.5523 µs 9.5949 µs]
instanciate once        time:   [936.66 ns 942.79 ns 950.65 ns]
instanciate every time (typed)
                        time:   [9.1276 µs 9.1405 µs 9.1541 µs]
instanciate once (typed)
                        time:   [897.98 ns 916.17 ns 940.64 ns]
@g2p
Copy link
Contributor Author

g2p commented Oct 30, 2023

I'm curious if you can measure the difference between constructing new DTF in each memoization call vs actually memoizing it.

Yes, reusing is ten times faster (formatting a full datetime):
g2p/fluent-datetime@39133b9

@sffc
Copy link
Member

sffc commented Oct 30, 2023

Ok so we're currently at 0.9 vs 9 microseconds. That difference is a bit higher than expected but may be mitigated by #3865 which is currently in progress. Depending on the exact model we land on for icu_pattern, we should be able to approach near 0 on the constructor time if data is built in hybrid or preresolved mode.

@sffc sffc merged commit 582e46c into unicode-org:main Oct 30, 2023
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants