Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First pass for adding North American indigenous locales #596

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

patcon
Copy link

@patcon patcon commented Jun 20, 2020

Gitter chat context: https://gitter.im/box/mojito?at=5eed1a307ba3965373b936c1

Hey @aurambaj! No pressure to work with this PR if it's stepping on your toes, but I was eager to give it a shot!

Any thoughts on reviewing this? (I'm sure there are things wrong with this approach, but thought it better to have something to talk over 🙂 )

@CLAassistant
Copy link

CLAassistant commented Jun 20, 2020

CLA assistant check
All committers have signed the CLA.

@patcon
Copy link
Author

patcon commented Jun 20, 2020

And FYI, I'm attempting to use UN M49 Standard Country or Area Codes "representing geographical (continental and sub-continental) supranational regions" (e.g., nv-003) as opposed to nation-state boundaries that may be disputed by the nations speaking the language (e.g. nv-US). This is consistent with es-419 (latin american spanish), which also uses one of these codes.

I tried to choose the best of either 003 (North America, including Mexico and Central America) or 021 (Northern America, which excludes Mexico and below), based on where traditional territories seemed to be. A native speaker may be able to offer more input on how best to handle this.

Other options would be to use 000 for "missing data", or 019 for "Americas", or perhaps something odd like 778 for "Transition countries" (though I think that has a specific meaning). Anyhow, the point is that I would defer to considering whatever native speakers might want to advocate for.

insert into plural_form_for_locale (locale_id, plural_form_id) values (804, 0);
insert into plural_form_for_locale (locale_id, plural_form_id) values (805, 0);
insert into plural_form_for_locale (locale_id, plural_form_id) values (806, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have standard plural form like english (singular, plural) instead of single one? Any clue how that works in those languages?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I just left the "zeros" as placeholders until I had time to figure out what these were.)

I can work on this! Might need some time. Can we add this later, and use a sane default for now, or does merging something incorrect lead to hassle?

@@ -56,7 +59,7 @@ permalink: /docs/refs/mojito-locales/
| en-US | English (United States) |
| en-ZA | English (South Africa) |
| en-ZW | English (Zimbabwe) |
| en-419 | Spanish (Latin America) |
| es-419 | Spanish (Latin America) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! can be a first commit just fixing that :)

@@ -32,9 +32,12 @@ permalink: /docs/refs/mojito-locales/
| bn-IN | Bengali (India) |
| bs-BA | Bosnian (Bosnia and Herzegovina) |
| ca-ES | Catalan (Spain) |
| chr-021 | Cherokee (Northern America) |
| cr-021 | Cree (Northern America) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current thinking is to not have region/territory unless it is a locale that is available in CLDR. I know currently Mojito always have a region but I'm thinking to move away from that pattern.

The new locale list would be any locale defined in: https://github.com/unicode-cldr/cldr-core/blob/master/availableLocales.json + any languages: https://github.com/unicode-cldr/cldr-localenames-full/blob/master/main/en/languages.json.

So for the few i checked they'd be only languages

thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fair. I'd love to support someone on getting the most appropriate language codes into CLDR itself through formal registration process, rather than doing something opinionated here :)

@@ -157,11 +164,11 @@ permalink: /docs/refs/mojito-locales/
| uz-UZ | Uzbek (Uzbekistan) |
| vi-VN | Vietnamese (Viet Nam) |
| xh-ZA | Xhosa (South Africa) |
| ypk-021 | Yupik (Northern America) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yupik seems to be a group of language, https://en.wikipedia.org/wiki/Yupik_languages. "ypk": "ems ess esu ynk", from this list only "esu" has a display name hence was generated from the script I wrote. All other entries you had are covered.

Wondering if you really need this entry of if "esu" would work in your case.

@@ -115,12 +119,15 @@ permalink: /docs/refs/mojito-locales/
| ms-BN | Malay (Brunei Darussalam) |
| ms-MY | Malay (Malaysia) |
| mt-MT | Maltese (Malta) |
| mus-021 | Creek (Northern America) |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Muscogee (Creek). Creek is the historical name, Muscogee (Creek) is the nation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Muscogee works fine here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, this is the small tedious stuff that I'd be happy to wade through the bureacracy on -- "Creek" is what's in CLDR dataset that's maintained internationally, so it'll keep showing up if it's wrong, since much of the field of localization seems programmatic or at least highly inclined to follow the standards:
https://github.com/unicode-cldr/cldr-localenames-full/blob/993632df2f5d6a2d33cbbf40d922474c2482eaca/main/en-001/languages.json#L390

If you can confirm this is universal that "Muscogee" is the more appropriate term, then I could wrangle with the bureacracy to have that reflected in the standards. We could review the standards and collect a list of stuff like this to push through en masse

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arecvlohe this is great info, thanks for sharing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hey, good news: seems someone else chimed in on this in Aug 2009, and it's in the upcoming release: https://unicode-org.atlassian.net/browse/CLDR-13193?jql=text%20~%20%22muskogee%22

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice! I guess I don't have to do anything then.

| nb-NO | Norwegian (Norway) |
| nl-BE | Dutch (Belgium) |
| nl-NL | Dutch (Netherlands) |
| nn-NO | Norwegian (Nynorsk) (Norway) |
| ns-ZA | Northern Sotho (South Africa) |
| nv-003 | Navajo (North America) |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Navajo is a name given by the Spanish. Better to go with Diné

Copy link
Author

@patcon patcon Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES. This is great insight. To clarify, this downstream tool isn't the place to fix this, but I'd like to support on this.

Term originates from this dataset: https://github.com/unicode-cldr/cldr-localenames-full/blob/993632df2f5d6a2d33cbbf40d922474c2482eaca/main/en-001/languages.json#L426

Current CLDR is v36 (release notes), and nv was added in March 2018 for v33 (release notes) (or some time in the year prior to its official version release)

I can try to dig up the mailing lists to see the conversation when this was added. It may be that this conversation happened already. But even so, receptivity to this sort of feedback may have changed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, requires login to UNICODE ticket tracker: https://unicode-org.atlassian.net/browse/CLDR-13814?jql=text%20~%20%22navajo%22

  • CLDR-13814 Addition of core data and new locale: Navajo (nv)
    • lots of missing information in the registration request -- looks like they could use some support
    • Was submitted by Google on May 24, but they just withdrew on June 8th, as they were not able to "get vetters" in time.

I'm not sure I understand, since it seems it's in there already, but perhaps it's not an official locale yet, as it's lacking some deeper level of detail.

Copy link
Author

@patcon patcon Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine they might naively lean on decisions like this:
https://www.indianz.com/News/2017/04/19/navajo-nation-council-rejects-bill-to-ch.asp

but if there are alternative perspectives that have been legitimized through community channels, it could perhaps be raised... maybe there's precedent for some other approach (e.g., multiple entries)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the information. Let's talk about this in a chat so I can get the proper perspective on it. What I would like to do is send this out through out social media and gather feedback that way. It would be nice to if someone can point me to something more official.

| ny-MW | Nyanja; Chewa; Chichewa (Malawi) |
| oj-021 | Ojibwa (Northern America) |
Copy link

@arecvlohe arecvlohe Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ojibway is a French term I think. Anishinaabe is what they call themselves

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto above

Quick research says this is much more entrenched than other terms. Seems to have originated in 2009 from days when changes happened via internet engineering taskforce (IETF) RFC:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I will have to read up more on this. Bureaucracy but without engaging Native peoples it seems. Is that the norm?

Copy link

@arecvlohe arecvlohe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of getting into the weeds in terms of naming but I guess you just have to decide what convention you will follow: language vs authority. Example. Diné (common usage) vs Navajo (from the name Navajo Nation),

@patcon
Copy link
Author

patcon commented Jun 23, 2020

No hey, not the weeds at all. At some level, this is important. As I understand, the changes are larger than this specific project, which is just parrotting the terms used from the international language standards.

This drove me down a rabbithole into the CLDR issue queue. Seems they're about to cut their yearly release. The process seems to involve changes having "vetters", of which "guest vetters" have a single vote count. From what I gather, a proposal for changes/additions requires 8 votes.

https://unicode-org.atlassian.net/

So any changes that you feel should be merged into the standard can perhaps get into the queue with enough non-insiders registering for their system and vouching for any changes.

@arecvlohe
Copy link

arecvlohe commented Jun 27, 2020

No hey, not the weeds at all. At some level, this is important. As I understand, the changes are larger than this specific project, which is just parrotting the terms used from the international language standards.

This drove me down a rabbithole into the CLDR issue queue. Seems they're about to cut their yearly release. The process seems to involve changes having "vetters", of which "guest vetters" have a single vote count. From what I gather, a proposal for changes/additions requires 8 votes.

https://unicode-org.atlassian.net/

So any changes that you feel should be merged into the standard can perhaps get into the queue with enough non-insiders registering for their system and vouching for any changes.

We can take this offline but I am not understanding how to be an outside vetter. Let me know on Slack and we can take it from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants