-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #66 Manage CLDR as peer dependency #68
base: master
Are you sure you want to change the base?
Conversation
@rxaviers this is definitely going in an interesting direction. However if I'm not mistaken @JCEmmons is post-processing the CDLR for ecma402 use (see https://github.com/ibm-js/ecma402/tree/master/cldr/config), so how do you envision this to work here? I suppose this would require yet another post-install hook? |
The post-processing described in cldr/config has the following goal, as I've interpreted it: avoid user to download unnecessary CLDR fields. Please, correct me if I'm wrong. Given the above concern, I assume we're all in the same page about going against data duplication, or we would end up counting coffee sugar calories, but eating a whole cake later. :P The fine tuning is indeed a valid goal. Not only for ibm-js/ecma402, but for all i18n projects. So, why don't we extend this for all of them? We could establish a declarative way to define the "field's filter" and have the post-install hook to handle it as well. To the fine tuning debate, I add one more question. Can we fine tune based on usage? E.g., a user formatting hours (not dates) Please, just let me know on any questions. |
I'm willing to take a look at this, but to be honest, I haven't had much time to do so yet. Now that CLDR 26 is out the door, perhaps I can carve out some time to look at better ways to do this, as Rafael suggests. For the time being, I have updated all of our data here to the 26 release, and updated a few testcases accordingly, so we should be all current for the next 6 months or so. |
Right we also ignore some of the .json files entirely.
That's definitely a good idea.
I suppose that would be a second step. I guess the first step is to make the post-install hook works with all the fields required by a given library. A second step could be to offer fine grain tuning on what the user is exactly leveraging the library. One concern I have though is, if each lib as different post install hook, then on the client we will end up with duplicating the data again? Or should the various post install hooks be able to work collaboratively and come up with a version that covers all the libraries used in a given application? (not sure how to achieve that). |
Exactly.
The post install hook is set on a given application. Each library only declares its CLDR needs. Exemplifying: https://gist.github.com/rxaviers/dbd2e56840c5c7a508e8 |
Thanks for the sample. I guess cldr.json vs bowser.js with cldr key is an implementation detail and we can sort out everything and decide on that later. The important point is making sure the filtering syntax is expressive enough to cover current use-cases like the following https://github.com/ibm-js/ecma402/blob/master/cldr/config/ecma402_cldr_ca_buddhist_config.txt as well as making sure the syntax makes the merge operation as easy as possible. |
👍 |
Considering we are all ok with the idea of "Avoid duplicating CLDR data" according to our last meeting, I will give this PR a next round of changes and will let you know. |
Two supplemental files cannot be find in any CLDR v26 json zip files:
Am I missing something, are they available somewhere? Should we update http://unicode.org/cldr/trac/ticket/7968 with that request? |
e891d36
to
236b3ee
Compare
236b3ee
to
55d9947
Compare
I've pushed a new commit that addresses the CLDR data duplication. It implements what's described in "Avoid duplicating CLDR data everywhere" above. It's not finished though. These items need to be addressed:
I will bring an initial definition for the filtering syntax above. I need a help on the two 🆘 items above. Failing tests:
|
@JCEmmons, where can I find supplemental/aliases.json or supplemental/localeAliases.json in the CLDR json zip files please? |
I've extended https://gist.github.com/rxaviers/dbd2e56840c5c7a508e8 to include a proposal for the fields filtering (see it at the end of the gist). Is there any currently-in-use-configuration not covered? |
See #73 (comment). |
Hi Rafael, After my testing of your fork, I am pretty sure the problem with the tests is that I have no idea of why |
Thank you @clmath. |
The supplemental/aliases.json and supplemental/localeAliases.json are alias files that aren't in the official CLDR distribution. But they probably need to be added to it. For the time being, you can pull them from https://github.com/JCEmmons/cldr-ecma402 until they are added to CLDR's configuration. |
In response to @clmath 's comment "I have no idea of why ecma402 is not using the official cldr data and I don't have enough knowledge in locales fallback to correct the tests. " - Is that the "official cldr data" has a lot of fields that ECMA-402 doesn't need. So when I generate the customized data for ECMA-402, I strip out the stuff that we don't use. Note that the program that generates the official data and the one that generates the custom data are the same program, so the JSON keys and fields should be compatible ( although to be honest, I haven't fully tested this yet ). I'm in the process of seeing what it would take in order to make ECMA-402 work properly with Rafael's "cldr-data". We give up some performance by loading more data, but the amount of additional data may be such that we don't care ( or can be filtered in some other fashion ). |
@JCEmmons can you handle that? Do you need help from anyone in order to get that done? |
I think there is more than that as the list of available locales is different between |
It seems to me like a different content. If this is the case, I wouldn't encourage cldr-data to change what's offered by Unicode. Let's have it fixed upstream instead (i.e., having Related question, json-full.zip brings duplicate identical data, e.g., |
Is this change needed to get this working or is it an improvement that could be tackled in a separate issue? |
@rxaviers I think it should be tracked in this issue as it is related to the list of locales available in |
Ok, I added this item up to the description as well then. |
About the failing tests, @clmath has already pointed out some customizations made by ecma402 which are not present (or different) in the official CLDR, which may be causing the issue. We need to list those differences that is causing problems. So, we can either: (a) have CLDR changed accordingly, or (b) have ecma402 code changed to conform with official CLDR. |
I think that is @JCEmmons call. On a side note I updated the description to keep the link to the locale list. |
👍 |
Yes, I can handle it. See http://unicode.org/cldr/trac/ticket/8040 Regards, John C. Emmons From: Rafael Xavier de Souza notifications@github.com
@JCEmmons can you handle that? Do you need help from anyone in order to get — |
👍 |
In answer to your question about en-US vs. en, pt-BR vs. pt, etc. - yes they can and should be identical. And if your software is smart enough to do removal of likely subtags, you don't need en-US, fr-FR, etc. When we put together the original zip files, we tried not to make any assumptions about what people should and shouldn't do with the data. We didn't want people to have to implement the inheritance mechanism as defined by tr35 just to use the locale data. |
Ongoing summary:
supplemental/aliases.json
orsupplemental/localeAliases.json
via http://unicode.org/cldr/trac/ticket/8040 -- @JCEmmonscldr/config
now lives incldr-extra/config
)Follow up:
en
anden-US
. They are identical given likelySubtags. Also, removingen-US
likely subtags givesen
. @JCEmmons, please correct me if I'm wrong. Why are they included? Shouldn'ten
suffice? This also applies topt
vs.pt-BR
and probably a bunch of others.Manage CLDR data as peer dependency
Since ibm-js/ecma402 already suggests
bower
usage to manage and install dependencies, the first change this PR does is to also manage cldr-data using the same tool.At this point, user is able to spot conflicting cldr-data peer dependency between this and other i18n libraries that leverage cldr-data (for example jquery/globalize, which has parsing methods that complements ecma402).
Avoid duplicating CLDR data everywhere
The second step this PR could take, this is optional, is to rely on the above bower setup to avoid having CLDR data embedded in this library.
The benefits of this change are:
be used for that. By you (developing it), or by users (using it).
The changes involved here are:
Changes of the form:
Avoid duplicating CLDR logic (and save runtime memory)
This is about a step further and it's about our both projects using the same foundation low level CLDR traverser cldrjs. But, I leave this for a next iteration after we talk about the above.
Obviously, my PR isn't complete. I want to discuss before implementing.
Fixes #66