Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Criteria idea: Accessible to those who understand English #230

Closed
david-a-wheeler opened this issue Feb 26, 2016 · 21 comments
Closed

Criteria idea: Accessible to those who understand English #230

david-a-wheeler opened this issue Feb 26, 2016 · 21 comments

Comments

@david-a-wheeler
Copy link
Collaborator

I had a discussion with others today about the need to make sure that people can participate from a variety of places and cultures. A criterion that just said, "don't create place/culture barriers" isn't very actionable or measurable.

However, it would be possible to include, "The project MUST be accessible to those who understand the English language."

It's difficult for people to work together without at least one common natural language, and in FLOSS projects today that common language is English. Indeed, in technology circles English is the linga franca. This doesn't require that anyone be a native speaker, just that if you know English you should be able to participate. For an example, witness all the effort that the LibreOffice people have put in to translate the comments from German into English. Supporting English enables more people worldwide to participate.

This potential criterion may be controversial. That's okay, my goal is to get ideas out there for discussion. Perhaps this is a bad idea, or one that should only be at higher levels. Please discuss!

@altonius
Copy link
Contributor

wow, this could be very controversial, especially is developing a project
that is specific to a particular country or ethnic group. the first
theoretical problem that comes to mind is a project that is aiming to help
with indigenous languages having this requirement forced on them - ouch!

Here's my take:

  • maybe rather than focus on the languages creating barriers, we should
    focus on the project having some type statement on how they create an
    inclusive culture.
    • this could be provided as a URL
    • Maybe included in something like the CONTRIBUTING file
    • or maybe something like a CULTURE file at the root of each project
      becomes the norm.
    • wouldn't that be lovely an MIT CULTURE standard, a GPLv2 CULTURE
      standard......
  • if language is chosen as de-facto for culture then you're likely to
    automatically exclude projects in non-english countries, maybe you'd want
    to include "the project SHOULD define a primary
    (spoken/written/non-programming) language for communicating about the
    project".
  • That said, I'm not keen on defining language as the way to create an
    inclusive culture

On a language-related I learned earlier about https://www.transifex.com which
can be used to help translate projects to other languages, maybe it's
something that could be considered to internationalise our project in the
future - FLOSS projects don't have to pay.

Alton(ius)

@david-a-wheeler
Copy link
Collaborator Author

The goal would be to absolutely not prevent participation or support for a particular language group - it would be to enable others to participate. But it's a fair point that some projects only apply to a particular language group, and thus, imposing such a criterion might inhibit appropriate contributions. You noted that you're "not keen on defining language...", got it.

I did note that this "may be controversial". However, I'd rather have that controversy out in the open. I'm a big believer in brainstorming - put out ideas, even if there are problems. Maybe the problems can be solved, and maybe the problems make the idea unworkable. But we're more likely to get a better list if we can more openly discuss ideas.

You mentioned some alternatives, but I don't think they help. First, there's no need to ask organizations to "define a primary (spoken/written/non-programming) language for communicating about the project"." One look at a project's website will tell you that :-).

I'm hesitant of trying to make some sort of general statement about how a project creates "an inclusive culture". There are projects and organizations that formally document how they do this, but I don't think there's consensus about it, what it means, or even that it's necessarily a good thing (whatever "it" is). In general the "best practices" criteria try to identify generally-accepted criteria, and I'm skeptical we can get truly generally-accepted general criteria that would be worth it. Pretty much no one today has a CULTURE file, so it's hard to argue that this is a generally-accepted practice today. Many projects simply focus on the code and documentation, and how to contribute to them, and in many respects that kind of focus solves a lot. If you focus on the problem, then it's not hard to be grateful for help (when it's actually help). But it's not clear that can be put into a useful criterion.

I'd really like to focus on specific criteria we can measure relatively unambiguously and that are currently generally accepted as good things. "Presence of English" is relatively easy to measure unambiguously. Of course, if it's not generally accepted as a good thing, then let's drop it. There are lots of ideas that get dropped after discussion :-).

I guess I'm currently leaning against this criterion now, but I can be swayed...!

@dankohn
Copy link
Contributor

dankohn commented Feb 26, 2016

I will register in support of the criterion. We could modify it to say,
"Whatever the primary language of the core developers may be, the project
should include documentation and be able to accept bug reports and code
comments in English, since English is currently the lingua franca of
technology." I would also make lingua franca a link:
https://en.wikipedia.org/wiki/Lingua_franca

Dan Kohn mailto:dankohn@linux.com
Senior Advisor, Core Infrastructure Initiative
tel:+1-415-233-1000

On Fri, Feb 26, 2016 at 10:12 AM, David A. Wheeler <notifications@github.com

wrote:

The goal would be to absolutely not prevent participation or support
for a particular language group - it would be to enable others to
participate. But it's a fair point that some projects only apply to a
particular language group, and thus, imposing such a criterion might
inhibit appropriate contributions. You noted that you're "not keen on
defining language...", got it.

I did note that this "may be controversial". However, I'd rather have that
controversy out in the open. I'm a big believer in brainstorming - put out
ideas, even if there are problems. Maybe the problems can be solved, and
maybe the problems make the idea unworkable. But we're more likely to get a
better list if we can more openly discuss ideas.

You mentioned some alternatives, but I don't think they help. First,
there's no need to ask organizations to "define a primary
(spoken/written/non-programming) language for communicating about the
project"." One look at a project's website will tell you that :-).

I'm hesitant of trying to make some sort of general statement about how a
project creates "an inclusive culture". There are projects and
organizations that formally document how they do this, but I don't think
there's consensus about it, what it means, or even that it's necessarily a
good thing (whatever "it" is). In general the "best practices" criteria try
to identify generally-accepted criteria, and I'm skeptical we can get truly
generally-accepted general criteria that would be worth it. Pretty much no
one today has a CULTURE file, so it's hard to argue that this is a
generally-accepted practice today. Many projects simply focus on the code
and documentation, and how to contribute to them, and in many respects that
kind of focus solves a lot. If you focus on the problem, then it's not hard
to be grateful for help (when it's actually help). But it's not clear that
can be put into a useful criterion.

I'd really like to focus on specific criteria we can measure relatively
unambiguously and that are currently generally accepted as good
things. "Presence of English" is relatively easy to measure unambiguously.
Of course, if it's not generally accepted as a good thing, then let's drop
it. There are lots of ideas that get dropped after discussion :-).

I guess I'm currently leaning against this criterion now, but I can be
swayed...!


Reply to this email directly or view it on GitHub
#230 (comment)
.

@david-a-wheeler
Copy link
Collaborator Author

Ok, dropping it down to SHOULD makes sense. The MUST and SHOULD criteria are normally worded as "The project MUST|SHOULD...", so let's use that consistent pattern unless there's a reason we shouldn't. Also, we should put details and rationale in later sentences, so that people can hide them. Finally, instead of "code comments" how about "comments about code"; while I think it's wisest to have comments embedded in code use English, I think the bigger point is that comments about code should be accepted if they're in English.

How about this?:

The project SHOULD include documentation in English and be able to accept bug reports and comments about code in English. English is currently the lingua franca of computer technology; supporting English increases the number of different potential developers and reviewers. A project can meet this criterion even if its core developers' primary language is not English.

@dankohn
Copy link
Contributor

dankohn commented Feb 26, 2016

I like this but I think we should move it back to MUST. I'm not aware of
projects that would even be close to qualifying for the badge but wouldn't
because of this requirement.

Dan Kohn mailto:dankohn@linux.com
Senior Advisor, Core Infrastructure Initiative
tel:+1-415-233-1000

On Fri, Feb 26, 2016 at 1:16 PM, David A. Wheeler notifications@github.com
wrote:

Ok, dropping it down to SHOULD makes sense. The MUST and SHOULD criteria
are normally worded as "The project MUST|SHOULD...", so let's use that
consistent pattern unless there's a reason we shouldn't. Also, we should
put details and rationale in later sentences, so that people can hide them.
Finally, instead of "code comments" how about "comments about code"; while
I think it's wisest to have comments embedded in code use English, I
think the bigger point is that comments about code should be accepted
if they're in English.

How about this?:

The project SHOULD include documentation in English and be able to accept
bug reports and comments about code in English. English is currently the lingua
franca https://en.wikipedia.org/wiki/Lingua_franca of computer
technology; supporting English increases the number of different potential
developers and reviewers. A project can meet this criterion even if its
core developers' primary language is not English.


Reply to this email directly or view it on GitHub
#230 (comment)
.

@altonius
Copy link
Contributor

I was agreeing with you that it could be controversial and was playing
devils advocate. like you, I'm a fan of having the discussion openly
(though we possibly have some self-selection bias with this criteria as
anyone who can't communicate in english is already excluded from this
discussion :-) )

Back to reality now (instead of my own overly lofty hypotheticals) - I like
the proposed criteria and the rationale behind the criteria, and I'm voting
for it being a SHOULD criteria, it gives current and future projects that
we may not be aware of the opportunity to provide the rationale and still
meet best-practices.

I'd much rather a project fail to achieve a badge due to bad coding or
security practices instead of the language(s) it uses for communicating.

Alton(ius)

On Sat, 27 Feb 2016 at 05:28 Dan Kohn notifications@github.com wrote:

I like this but I think we should move it back to MUST. I'm not aware of
projects that would even be close to qualifying for the badge but wouldn't
because of this requirement.

Dan Kohn mailto:dankohn@linux.com
Senior Advisor, Core Infrastructure Initiative
tel:+1-415-233-1000

On Fri, Feb 26, 2016 at 1:16 PM, David A. Wheeler <
notifications@github.com>
wrote:

Ok, dropping it down to SHOULD makes sense. The MUST and SHOULD criteria
are normally worded as "The project MUST|SHOULD...", so let's use that
consistent pattern unless there's a reason we shouldn't. Also, we should
put details and rationale in later sentences, so that people can hide
them.
Finally, instead of "code comments" how about "comments about code";
while
I think it's wisest to have comments embedded in code use English, I
think the bigger point is that comments about code should be accepted
if they're in English.

How about this?:

The project SHOULD include documentation in English and be able to accept
bug reports and comments about code in English. English is currently the
lingua
franca https://en.wikipedia.org/wiki/Lingua_franca of computer
technology; supporting English increases the number of different
potential
developers and reviewers. A project can meet this criterion even if its
core developers' primary language is not English.


Reply to this email directly or view it on GitHub
<
#230 (comment)

.


Reply to this email directly or view it on GitHub
#230 (comment)
.

@david-a-wheeler
Copy link
Collaborator Author

Let's add this as a SHOULD to start. We can change the category later, and putting it in the main text will get more visibility. In general I've tried to emphasize best practices that an individual developer could do with some non-Herculean effort. Learning an entire natural language is a big step higher. We could say for the latter (or at least English with a restricted vocabulary). See special english or the xkcd stuff with the top ten hundred words.

@david-a-wheeler
Copy link
Collaborator Author

Here's a tweaked version, currently SHOULD, but mentioning use of a Simple English and adding the word "worldwide" (which hopefully helps readers realize the issue is simply that English is known around the world).

The project SHOULD include documentation in English and be able to accept bug reports and comments about code in English (at least some form of Simple English). English is currently the lingua franca of computer technology; supporting English increases the number of different potential developers and reviewers worldwide. A project can meet this criterion even if its core developers' primary language is not English.

@dankohn
Copy link
Contributor

dankohn commented Feb 27, 2016

I would remove the Simple English parenthetical, since it doesn't lead to a
clear Wikipedia page.

Dan Kohn mailto:dankohn@linux.com
Senior Advisor, Core Infrastructure Initiative
tel:+1-415-233-1000

On Sat, Feb 27, 2016 at 2:49 PM, David A. Wheeler notifications@github.com
wrote:

Here's a tweaked version, currently SHOULD, but mentioning use of a Simple
English and adding the word "worldwide" (which hopefully helps readers
realize the issue is simply that English is known around the world).

The project SHOULD include documentation in English and be able to accept
bug reports and comments about code in English (at least some form of Simple
English https://en.wikipedia.org/wiki/Simple_English). English is
currently the lingua franca https://en.wikipedia.org/wiki/Lingua_franca
of computer technology; supporting English increases the number of
different potential developers and reviewers worldwide. A project can meet
this criterion even if its core developers' primary language is not English.


Reply to this email directly or view it on GitHub
#230 (comment)
.

@david-a-wheeler
Copy link
Collaborator Author

Ok, will remove parenthetical about Simple English.

@david-a-wheeler
Copy link
Collaborator Author

This is resolved by commit b7dd4f2

@nemobis
Copy link

nemobis commented Jun 11, 2016

Requiring English may work in practice, but is discriminatory and leaves a bad taste. English is not the only vehicular language used in practice by online communities (for instance there are groups of speakers of Italian, Portuguese and Spanish who interact each using their own language) and English monolinguals should not be favoured over other monolinguals.

If non-native speakers of English are forced to make an effort to speak English, an equal effort should be required from native speakers, e.g. by mandating the existence of (open) processes to translate software and documentation.

@david-a-wheeler
Copy link
Collaborator Author

This is very much worth discussing, thanks for your comments. I do think the current criterion makes sense, but dialogue is always an excellent idea. Please allow me to expand further (beyond the text above).

English is no better nor worse than any other natural language. I'm all for people learning and using many natural languages. I studied French (and can still read technical French), when I was young I spoke German (I lived there for years), and I've also studied Greek, Portuguese, and American Sign Language. I think it's sad that so many languages around the world are disappearing.

However, if people across the world are going to work together, they need to have some way to communicate. It's not reasonable to require that all people learn all languages. If there's a group of developers that can only communicate in Spanish, they will exclude the larger community of people who cannot speak Spanish. Any language decision will exclude someone - so how can we minimize those who are excluded?

Today, it's typically a given that the language for international communication is English. For example, The International Civil Aviation Organisation (ICAO) has established English language proficiency requirements (LPRs) for all pilots operating on international routes, and all air traffic controllers who communicate with foreign pilots.; "These standards require pilots and air traffic controllers to be able to communicate proficiently using both ICAO phraseology and plain English."

Wikipedia's list of languages by total number of speakers makes it clear why this is the case. The languages with the most number of speakers (at either L1 or L2 levels) are:

Language Total
English 1,500 million
Mandarin Chinese 1,090 million
Spanish 560 million
Hindustani (Hindi-Urdu) 541 million
Arabic 395 million
Russian 260 million
Malay 250 million
Portuguese 250 million
French 220 million
German 210 million

The EU officially supports 24 languages, but even with hundreds of millions of Euros spent on translation the EU can't keep up. In practice, most EU institutions use English as their working language. If a group only speaks Spanish, they can only reach 1/3 as many people as English (and in practice even less, since it's typically easy to find at least some English speakers in nearly every city in the world).

OSS projects typically don't have the EU's translation budget, so English is an even more common feature in the OSS community. LibreOffice, for example, has been working hard to translate its comments from German to English:
https://wiki.documentfoundation.org/Development/Easy_Hacks/Translation_Of_Comments
https://bugs.documentfoundation.org/show_bug.cgi?id=39468
Linus Torvalds' native language is Swedish, and he also knows Finnish and English - but kernel development has always been in English.

an equal effort should be required from native speakers, e.g. by mandating the existence of (open) processes to translate software and documentation.

Do you have specific criteria text to propose? One challenge is that projects handle this in different ways; in some cases the documentation is actually a separate project, so that might be hard to mandate.

I certainly agree that software intended primary for end-users should be internationalized and support localization. However, if the software's primary purpose is to be a library for other developers, many library developers would expect that the software developers will learn English. Then nobody has to translate billions of pages into thousands of languages. After all, anyone can learn English; it's not proprietary. Clearly that is not an outcome that would make monolinguists of other languages happy, but the economics of trying to translate everything for everyone is difficult to justify in many cases. Indeed, for a number of projects, development velocity is the most important factor to optimize for; anything that slows development (like requiring translations or copyright transfers) is dangerous to the project since it would make the project uncompetitive.

There is a potential radical change: Continued improvements in freely-available machine translation. I'm quite aware that current machine translations leave a lot to be desired. That said, if you limit yourself to simple grammatical structures, and avoid slang and idiomatic phrases, they are already good enough for very simple constructs. If machine translation continues to improve, to the point where people are happy to turn on "auto translate" (or whatever) in their browser and text editor, then there may be no need for a common language. The goal isn't 1 language, it's simply to enable widespread communication.

So, what do you suggest? Adding something? Dropping it?

@Nikerabbit
Copy link

If we were talking about a company, it makes total sense to agree on a common language to increase productivity, because that is what earns them money and that's what companies do.

But open source is not a company and I highly contest producing code at highest possible rate is what should be optimized.

English does not need any more help to stay as the lingua franca of technology. The other languages do need support so that software development does not become more of a monoculture. What I think we should be looking for is an inclusive culture. This partially overlaps with Code of Conduct and similar constructs that disallow bad behavior and discrimination.

In practice this would mean, for example, that if someone posts a bug report in Russian, they developers would try to understand it via machine translation before telling the reporter just to go away; or not ignoring the message of a person who writes in really bad English. Some larger projects can have volunteers that can help in translating, but of course smaller projects do not have this luxury.

Having code comments in English or at least in one language only would in my opinion fall in the category of having a coding style guideline and enforcing it.

I believe we are not actually disagreeing on the goal of widespread communication. But there seems to be an unmentioned assumption whether the focus should be on making if easier for those already inside in the ecosystem, or whether the focus should be to accommodate the people who we are currently inadvertently excluding.

The (possibility of) localisation of open source projects has been their strength, and can be considered as a best practice. It would be weird then if we had the opposite best practice for developmental activity. In conclusion, I think enforcing English is not the way we should take and I propose dropping this criteria until we find something better.

@nemobis
Copy link

nemobis commented Jul 7, 2016

Another way to look at what I said earlier, i.e. that being a native or near-native English speaker doesn't mean one speaks effective English. https://medium.com/@mollyclare/taming-the-steamroller-how-to-communicate-compassionately-with-non-native-english-speakers-d95d8d1845a0

A multilingual mindset, if not actual multilingualism, is required from anyone to communicate effectively in an online community. A monolingual English speaker can make online written conversation harder than someone with a very bad English.

@dandv
Copy link

dandv commented Aug 9, 2018

I LOVE the idea of requiring English for any OSS project that doesn't pertain to a specific language, and it boggles my mind how people continue to post in Russian or German or whatever the language of some of the contributors happens to be 👎

The other languages do need support so that software development does not become more of a monoculture.

A culture comprises much more than language. Language, here, is just a means of communicating ideas. A standard like TCP or HTTP. Let's be specific. Having all sorts of weirdo networking standards (anyone remember IPX/SPX?) has not helped progress in sharing knowledge. Using more than one language doesn't help either:

  • having multiple languages in one project draws artificial language barriers
  • it encourages those who believe1 they don't speak English well, to continue to stay silo-ed in their language
  • it massively increases the risk of duplicating questions (e.g. on support forums) because those who search for the question likely won't find answers written in other languages
  • it reduces the pool of people who could help the question asker to those who understand their language, or who bother to machine translate it

if someone posts a bug report in Russian, they developers would try to understand it via machine translation before telling the reporter just to go away

How about suggesting to the Russian speaker to use machine translation in the first place? That way, they spend time translating once, instead of each person who reads the bug report translating it.

For more on this topic, please refer to this extensive essay I wrote on sticking to one language as a standard for communication. Jeff Atwood also wrote an excellent essay about English and programming specifically.


1It is quite well-known that the vast majority of Germans speak English very well, but many are afraid to do so for fear of committing some mistake. They thus favor German, which leads to the consequences I've mentioned above.

@nemobis
Copy link

nemobis commented Aug 9, 2018 via email

@dandv
Copy link

dandv commented Aug 12, 2018

I don't understand, was this sarcastic?

It was not. I'm genuinely puzzled by programmers who just seem to not notice what the prevalent language used in an online programming community (a repo, forum etc.) is.

That's like showing up to a meeting at someone's house, ignoring the fact that all attendees have left their shoes at the door, and walking in wearing your favorite shoes. Might be a form of dyslexia? I have no idea. Today I saw a Chinese user posting a question in an English repo, just like that. Not even with an apology that they don't speak English, or weren't able to use Google Translate (which is available in China). 😕

@simontseng
Copy link

I do suspect maybe this guy entered the Chinese title while browsing under some kind page translation program, think everything is in Chinese

I wouldn’t be surprised

@nemobis
Copy link

nemobis commented Aug 12, 2018 via email

@dandv
Copy link

dandv commented Aug 12, 2018

@simontseng

I do suspect maybe this guy entered the Chinese title while browsing under some kind page translation program, think everything is in Chinese
I wouldn’t be surprised

I really doubt that, for two reasons:

  1. Every web page translation software I've seen is pretty obvious, through a UI affordance (translation bar, pop-up prompt etc), and through the slightly broken translation it provides.

  2. The repo where I saw that user post a question in Chinese, is a private one. It requires registration involving sending a signed PDF form, in English. Everything about the repo, its site, wiki (which is required reading before posting an issue) is in English.

@nemobis

It's extremely discriminatory to think that everyone must talk English
and that if they don't they must have problems grasping reality.

Please re-read my post, because that's not what I said. Let me further clarify what I'm saying:

If you want help from an online community that predominantly uses a particular language, then it is in your best interest to use that language. It will not only show respect, but will maximize the chances of getting help.

Note that this is vastly followed in practice. Moreover, many of the most prolific contributors on GitHub are not native English speakers, yet they publish in English, I guess in order to get the widest exposure and PRs, but also to not discriminate by using their native tongue. Whether you like it or not, English is the lingua franca of software development.

Anyway, let's have a look at https://git.io/top:

  • out of the top 10 contributors, half are likely non-native speakers
  • every single pinned repo of the top 10 contributors is in English
  • @egoist is Chinese and @hugogiraudel is a diversity advocate. Both publish in English.

I'm all for being inclusive. But when I'm on a mostly-English software forum or repo and I see threads in a language I don't understand, I feel excluded (if not discriminated against). It may be cute to post a question in the language of one of the contributors, but it's immature. I know I've been tempted to do the same, but I've limited myself to replacing "Hi" with the Romanian "Salut" as a head nod to a compadre, because posting the entire issue in Romanian wouldn't help anyone else. We're on GitHub to share code and knowledge.

I also see posting the question in the language of the repo as a sign of respect. FLOSS authors put considerable effort into providing free software. if you want help, the least you can do is to ask your question in the language of that community. @nemobis, would you go post an issue in Italian or Finnish in one of @IonicaBizau's repos? If not, why not?

Also, there's a finer note on this language/respect thing: in my 9+ years of using and contributing to OSS projects on GitHub, I have never, not once, seen a PR to an English repo authored in a non-English language. I have only seen questions asked in other languages. That says something.

We want to be inclusive. Upon whom should the burden of translating such questions fall? Do we wait for contributors who speak that language to answer the question? What if none of them speak it? Do we use machine translation and answer back in the original language? Why exactly shouldn't the asker spend that bit of effort and use the translation software? Is it too much for us providers of free software and free consultations to ask for one minute of the asker's time to make their question (and our answer) intelligible to all? We want to be inclusive after all. Shouldn't that "we" include the asker?

Keep in mind that time is money. Apples to apples, support for, say, a charting library can be valued at $175/hour, or can be provided for free, as is the case with Highcharts, whose Norway-based team has been doing so since 2009. If you think about the costs involved, demanding that a FLOSS developer answer your question in your mother tongue, is the equivalent of both begging for money, and hoarding, because your question isn't searchable by speakers of any other language, so the developer will have to provide the answer again when the same question gets asked.

I hope this makes my point clearer. And just to be extra clear, I'm not advocating for any particular language to be the lingua franca of all repos. Just as we use one prettyfying standard or another, the point is to stick to it. If you want help with some Chinese-only widget, by all means, use Chinese. But pick one most appropriate language for the repo, and have that be as standard as the coding language(s) used. Would you accept a PR in Python to a C repo?

Someone can be good at doing what they do even without having a good knowledge of English.

That's true for many fields, but less often the case for programming. StackOverflow founder Jeff Atwood wrote an excellent essay about this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants