Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-component features for datagen #3165

Open
Manishearth opened this issue Mar 3, 2023 · 9 comments
Open

Per-component features for datagen #3165

Manishearth opened this issue Mar 3, 2023 · 9 comments
Labels
C-data-infra Component: provider, datagen, fallback, adapters needs-approval One or more stakeholders need to approve proposal S-large Size: A few weeks (larger feature, major refactoring) T-enhancement Type: Nice-to-have but not required

Comments

@Manishearth
Copy link
Member

Manishearth commented Mar 3, 2023

Currently datagen pulls in all components. We should do the same thing we do for the metacrate and add per-component features, with a default_components and experimental_components feature so that people using --no-default-features are not burdened unduly.

This is going to be useful for people using the registry API as well: As we grow the registry functionality, it'll become more important to not pull in additional deps.

Personally I really want this for experimental features (I want to be able to avoid having deps on experimental stuff where possible, since experimental stuff still has breaking changes, but if we're doing this anyway might as well do it for everything)

Related: #2602

@Manishearth Manishearth added the discuss-priority Discuss at the next ICU4X meeting label Mar 3, 2023
@Manishearth Manishearth changed the title Add per-component features to datagen Per-component features for datagen Mar 3, 2023
@sffc
Copy link
Member

sffc commented Mar 3, 2023

This is going to be useful for people using the registry API as well: As we grow the registry functionality, it'll become more important to not pull in additional deps.

I really don't want to get into the practice of relying on component-level features to determine the set of keys used for anything, including the registry. Components are simply too coarse for key slicing to be effective.

@Manishearth
Copy link
Member Author

I really don't want to get into the practice of relying on component-level features to determine the set of keys used for anything, including the registry. Components are simply too coarse for key slicing to be effective.

No, I'm not suggesting that at all: I'm suggesting that people may wish to use the registry without pulling in every last dependency.

That said if we're making the registry more of a thing we probably want it to live in a separate crate anyway.

@Manishearth
Copy link
Member Author

So to make it clear, my personal motivation is that vendoring of pre-1.0 experimental crates is annoying and I'd rather not have to deal with that. They're explicitly unstable and tend to force updates to be more lockstep (or need more steps), and I would like a future where I do not have to deal with those unless we explicitly have clients who care.

This is a rather weak motivation, especially since it's datagen we're talking about here. I'd still like to see something happen here if possible.

I do think additional motivation is that people using a subset of ICU4X should be able to use the registry API without pulling in all of ICU4X, but that's also weak motivation given that the registry API is currently inside the mother of all dependencies (datagen). We should move it elsewhere and give it features but that's not what this issue is about.

I do recognize that datagen features lead to more potential footguns for users, I still maintain that we should be able to do a lot provided we keep the default reasonable. But I can accept there's disagreement on that.

cc @robertbastian since he has opinions on datagen having features, and previously removed experimental feature tagging

@sffc
Copy link
Member

sffc commented Mar 23, 2023

  • @robertbastian - If you disable experimental keys, then datagen should complain. It shouldn't just silently ignore it. And there could be a space of keys that datagen doesn't know about, which we can't distinguish from junk in the key file.

Resolutions:

  1. Add key strings to datagen for all keys, even keys whose components are disabled. Print warnings for unknown strings, and errors for strings whose components are disabled.
  2. Aspire to have component-specific datagen features. OK to start with experimental-only.

@sffc sffc removed the discuss-priority Discuss at the next ICU4X meeting label Mar 23, 2023
@sffc sffc added this to the 1.x Priority ⟨P2⟩ milestone Mar 23, 2023
@sffc sffc added T-enhancement Type: Nice-to-have but not required C-data-infra Component: provider, datagen, fallback, adapters S-large Size: A few weeks (larger feature, major refactoring) labels Mar 23, 2023
@robertbastian
Copy link
Member

Outside experimental it gets a lot messier, there are many interactions between e.g. calendar/datetime/timezone, decimal/compactdecimal, a lot of things/properties. This will require extensive restructuring of the datagen crate, which I don't think is worth the effort at this point. Let's backlog this.

@Manishearth
Copy link
Member Author

Yeah that seems fair. Experimental's the annoying one anyway since upgrading those can take effort

@robertbastian robertbastian removed their assignment Jul 13, 2023
@sffc sffc added the needs-approval One or more stakeholders need to approve proposal label Apr 14, 2024
@sffc
Copy link
Member

sffc commented Apr 14, 2024

@Manishearth Now that we have a single icu_experimental crate, do you think datagen still needs component-specific features, or would a single experimental feature suffice?

As @robertbastian noted, it's not very easy to add component features to datagen due to interaction between components at datagen time.

@Manishearth
Copy link
Member Author

I think that interaction between datagen components is about as complicated as the interaction between their base crates, yes?

It's still nice to not have to deal with the cyclic import dependency of needing to pull in new crates but being unable to pull in new crates because you need to update all the deps first. The decisions we made about icu_provider and icu_locid compatability help, here.

I'm weakly against removing the features. It seems like people have stronger reasons to remove it.

@robertbastian
Copy link
Member

I'm weakly against removing the features. It seems like people have stronger reasons to remove it.

We don't currently have these. We have a single experimental feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-data-infra Component: provider, datagen, fallback, adapters needs-approval One or more stakeholders need to approve proposal S-large Size: A few weeks (larger feature, major refactoring) T-enhancement Type: Nice-to-have but not required
Projects
None yet
Development

No branches or pull requests

3 participants