Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split DataProvider into ResourceProvider and DynProvider #1554

Merged
merged 66 commits into from
Feb 3, 2022

Conversation

sffc
Copy link
Member

@sffc sffc commented Jan 28, 2022

Fixes #570

This PR adds ResourceMarker, which extends DataMarker, adding a KEY field. There is a corresponding ResourceProvider trait which uses the ResourceMarker to auto-fill the key in requests. The DataProvider trait has been renamed to DynProvider to serve more specific use cases. Throughout components, I migrated most users over to ResourceProvider.

There does not appear to be any impact on static data slicing.

Open questions:

  1. I left the big icu_properties crate on DynProvider. I have a stash where I migrated it to ResourceProvider, which works for plain binary properties, but it does not work very well for binary enumerated properties. Should I:
    1. Leave everything in icu_properties as DynProvider, or
    2. Migrate what I can to ResourceProvider, or
    3. Leave it as-is for now and open a follow-up issue
  2. Should I add a macro to generate the ResourceMarker impls? It seems that writing them by hand isn't really too bad.

Still to-do:

  • Migrate icu_provider_cldr and icu_provider_uprops
  • Add documentation for new exported symbols
  • Fix docs tests everywhere

@sffc sffc requested a review from Manishearth January 28, 2022 05:16
Comment on lines 111 to 113
impl ResourceMarker for DecimalSymbolsV1Marker {
const KEY: ResourceKey = key::SYMBOLS_V1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that a macro just for this impl isn't much use. However I think we should aim for a 1-1 correspondence between ResourceMarkers and keys, because defining ResourceMarker on different structs with the same key seems like a footgun.

I'd say a ResourceKey and its single ResourceMarker impl should be generated by the same macro. Maybe something like the current #[data_struct] that also generates the keys and links them:

#[icu_provider::resource_struct]
#[resource_key(KEY_NAME, "key/path@1")]
#[resource_key(KEY_NAME_2, "key/path_2@1")]
pub struct FooV1<'data> { ... }

or alternatively with more prominent keys:

#[icu_provider::data_struct]
pub struct FooV1<'data> { ... }

define_resource!(KEY_NAME, "key/path@1", FooV1);
define_resource!(KEY_NAME_2, "key/path_2@1", FooV1);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this idea

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! We should do something along these lines. I like the first version better. I would also like to get rid of the separate key constants, because you will be able to access them via FooV1Marker::KEY.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another interesting proposal. We can allow only one key per struct, and for structs with multiple keys, we make them newtypes with #[serde(transparent)]. They can be cast to the inner type if needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I got working:

use icu_provider::prelude::*;
use std::borrow::Cow;

// We also need `yoke` in scope: <https://github.com/unicode-org/icu4x/issues/1557>
use icu_provider::yoke;

#[icu_provider::data_struct(
    "demo/foo@1",
    BarV1Marker = "demo/bar@1",
    BazV1Marker = "demo/baz@1"
)]
pub struct FooV1<'data> {
    message: Cow<'data, str>,
};

assert_eq!(FooV1Marker::KEY.get_path(), "demo/foo@1");
assert_eq!(BarV1Marker::KEY.get_path(), "demo/bar@1");
assert_eq!(BazV1Marker::KEY.get_path(), "demo/baz@1");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Do we need custom marker names? We could derive them from the key path: "demo/foo@1" -> DemoFooV1Marker (or maybe ignore the name space as the rust module takes that role).

Not a fan of the newtype solution as that would add quite a bit of boilerplate.

Copy link
Member Author

@sffc sffc Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly dislike the practice performed by a handful of computer frameworks that auto-generate names by munging other names into a different case convention. I would much rather have the literal symbol DemoFooV1Marker be in source code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I think it'd be neater if the default marker name wasn't generated from the struct name but was also a required param.

And if we don't use equivalent names, should we at least verify that the key's version matches the struct's version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I changed the macro so that the marker type is explicit.

And if we don't use equivalent names, should we at least verify that the key's version matches the struct's version?

Seems unnecessary to me, but we could do it in a follow-up.

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this approach so far!

provider/core/src/data_provider.rs Show resolved Hide resolved
components/plurals/src/lib.rs Show resolved Hide resolved
Comment on lines 111 to 113
impl ResourceMarker for DecimalSymbolsV1Marker {
const KEY: ResourceKey = key::SYMBOLS_V1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this idea

provider/core/src/data_provider.rs Show resolved Hide resolved
@Manishearth
Copy link
Member

  1. I left the big icu_properties crate on DynProvider. I have a stash where I migrated it to ResourceProvider, which works for plain binary properties, but it does not work very well for binary enumerated properties. Should I:

    1. Leave everything in icu_properties as DynProvider, or
    2. Migrate what I can to ResourceProvider, or
    3. Leave it as-is for now and open a follow-up issue

(3) seems okay to me, but my personal preference would be (2)? Is there a reason why it doesn't work well for binary enumprops?

  1. Should I add a macro to generate the ResourceMarker impls? It seems that writing them by hand isn't really too bad.

I think it would be nicer yeah but I don't consider it mandatory

provider/blob/src/path_util.rs Outdated Show resolved Hide resolved
provider/core/src/data_provider.rs Outdated Show resolved Hide resolved
provider/core/src/data_provider/test.rs Show resolved Hide resolved
provider/core/src/hello_world.rs Outdated Show resolved Hide resolved
provider/core/src/resource.rs Outdated Show resolved Hide resolved
@@ -68,14 +68,17 @@ impl TryFrom<&str> for TimeZonesProvider {
impl KeyedDataProvider for TimeZonesProvider {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should KeyedDataProvider be KeyedDynProvider now?

Copy link
Member Author

@sffc sffc Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. But I want to get rid of KeyedDataProvider and it is tracked in #442, and it is a private internal type to provider_cldr. So I prefer not to mess with it right now.

provider/core/src/data_provider.rs Outdated Show resolved Hide resolved
provider/core/src/data_provider.rs Outdated Show resolved Hide resolved
@@ -203,13 +204,14 @@ impl DataError {
///
/// If the "log_error_context" feature is enabled, this logs the whole request. Either way,
/// it returns an error with the resource key portion of the request as context.
pub fn with_req(self, req: &DataRequest) -> Self {
#[cfg_attr(not(feature = "log_error_context"), allow(unused_variables))]
pub fn with_req(self, key: ResourceKey, req: &DataRequest) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these functions should be adding one piece of context at a time. If the client wants to log both key and request they can do .with_key(k).with_options(o), but they should be able to only log options as well (we already have key-only). This is useful for

  • RequestOptions::try_langid, which can return DataErrorKind::NeedsLocale.with_options(self)
  • DataPayload::try_langid which wouldn't have to accept a context: ResourceKey, as that can be added by the caller with with_key

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted with_req to require both arguments because they are both important as context for the errors. This is a side-effect of splitting the key and the options into separate arguments.

Comment on lines 111 to 113
impl ResourceMarker for DecimalSymbolsV1Marker {
const KEY: ResourceKey = key::SYMBOLS_V1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I think it'd be neater if the default marker name wasn't generated from the struct name but was also a required param.

And if we don't use equivalent names, should we at least verify that the key's version matches the struct's version?

Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by naming. You dropped the DataProvider type for ResourceProvider and DynProvider, but kept data_provider parameter names and use "data provider" in docs.

components/plurals/src/lib.rs Show resolved Hide resolved
}
}
}

impl ResourceOptions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're changing the API, I think we should add constructors for ResourceOptions and make the fields private. Currently there is no instance that has a variant but no langid, and I think we should require that. Otherwise we cannot parse stringified ResourceOptions, as both

ResourceOptions { variant: Some(Cow::Borrowed("und")), langid: None}

ResourceOptions { variant: None, langid: Some(langid!("und")) }

are 'serialized' to "und".

We need that for #1092.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, I don't want to make this change right now because I am reconsidering the whole data model of ResourceOptions as part of #243.

@sffc
Copy link
Member Author

sffc commented Feb 3, 2022

Code size:

Before:

-rwxr-xr-x 1 runner docker    41360 Feb  2 23:16 optim4.elf
-rwxr-xr-x 1 runner docker    31592 Feb  2 23:16 optim5.elf

After:

-rwxr-xr-x 1 runner docker    41360 Feb  3 00:11 optim4.elf
-rwxr-xr-x 1 runner docker    31560 Feb  3 00:11 optim5.elf

robertbastian
robertbastian previously approved these changes Feb 3, 2022

All data providers can fit into one of two classes.

1. Type 1: Those whose data originates as structured Rust objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I'm not a huge fan of "type 1" and "type 2" naming, can we just call these any providers and buffered providers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sffc sffc merged commit 20787e7 into unicode-org:main Feb 3, 2022
@sffc sffc deleted the resourcemarker branch February 3, 2022 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add mechanism for linking resource keys with data structs
3 participants