Skip to content
This repository was archived by the owner on Jun 24, 2022. It is now read-only.

NEW [WIP] Add localization support with jekyll-simple-i18n #1509

Draft
wants to merge 76 commits into
base: master
Choose a base branch
from

Conversation

jonaharagon
Copy link
Contributor

@jonaharagon jonaharagon commented Nov 22, 2019

Supersedes #1458. This brings changes from #1503 into the main repo for development.


@djoate:

Preview of index.html translation: https://deploy-preview-1509--privacytools-io.netlify.com/es/
English site: https://deploy-preview-1509--privacytools-io.netlify.com

Sample of partially translated pages:

Every other page has not been translated (pages without translate: true in front matter are not generated)

This is not meant to be a full-fledged PR but rather a proof of concept for a better solution for localizing the site. If this is an acceptable solution, please make this into a branch in this repository so that we can start localizing using this plugin.

With the jekyll-simple-i18n plugin (MIT licensed), this makes translating the site easier to manage. You should visit the plugin's GitHub repository and read up on it, but here are the features of this plugin from their README:

  • No external dependencies. Plugins utilize existing Jekyll features.
  • Source strings and page titles can be placed directly in templates for seamless editing and readability.
  • A ready-to-translate YAML file that includes all of the canonical source strings is generated every time the site is built.
  • It's easy to add new languages. Just create a single file that contains the translated strings. Everything else happens automatically.
  • Custom front matter is added to translated pages that can be used within your Liquid templates.
  • Optional Transifex integration.
  • Built-in support for hreflang tags.

It's based on Transifex, but it can be used with a different service such as Weblate. I've made some modifications to the plugin (e.g. renaming transifex to weblate and handling null source text).

This PR includes an example of index.html and a part of card.html (The "Learn More" button text) being translated. The plugin did not seem to work with the github-pages gem, so github-pages was switched with jekyll gem (which is what the current i18n branch does anyway). You can go to https://deploy-preview-1509--privacytools-io.netlify.com/es/ (or build locally) to see the following:

indextransl

https://deploy-preview-1509--privacytools-io.netlify.com gives the original English site.

Here is a snippet of the source code for index.html:

<h1 id="sponsors" class="anchor"><a href="#sponsors"><i class="fas fa-link anchor-icon"></i></a> {% t Sponsors%}</h1>

<div class="alert alert-success" role="alert">
  <strong>{% t New!%}</strong> {% t Showcase your brand as a sponsor of PrivacyTools here and support our mission of creating a world free of mass surveillance!%} <a href="/{% if page.language %}{{ page.language }}/{% endif %}sponsors/" class="alert-link">{% t Learn more...%}</a>
</div>

A snippet of resources.html:

<p><a href="/{% if page.language %}{{ page.language }}/{% endif %}classic/"><i class="fas fa-info-circle"></i> {% t Prefer the classic site? View a single-page layout.%}</a></p>

<div class="row">

  {% capture providers_title %}{% t Providers %}{% endcapture %}
  {% capture providers_page %}/{% if page.language %}{{ page.language }}/{% endif %}providers/{% endcapture %}
  {% capture providers_description %}{% t Discover privacy-centric online services, including email providers, VPN operators, DNS administrators, and more!%}{% endcapture %}

  {% include card.html color="success"
  title=providers_title
  icon="fas fa-server"
  iconcolor="dark"
  page=providers_page
  description=providers_description
  %}

Instead of using keys and two different files, you just wrap the original text around with {% t ... %} tags, and the plugin will automatically key that string (with its own ID) into weblate-source-file.yml. If you are trying to translate things inside of a card, you have to do the same thing as before with capturing text.

The source YAML is generated on build into the root folder of the repo. This source file can then be copied into _data/languages/ and then renamed into one of the languages in the language map to set up a translation. This seems much easier to maintain compared to cross referencing between two different files.

The plugin will also not create multiple keys for duplicates of the exact same string. For example, {% t Worth Mentioning %} will have one key associated with it, and there will only be one key to translate. All other pages that use {% t Worth Mentioning %} will share the same key (however, I've modified it so that, for instance, {% t Worth mentioning %} and {% t Worth Mentioning! %} would have distinct keys)

We would have to replace local links with something like this in order to get the right pages (and I believe external links can be wrapped in translate tags without a problem):

<a href="/{% if page.language %}{{ page.language }}/{% endif %}sponsors/" class="alert-link">{% t Learn more...%}</a>

A porition of the source file, weblate-source-file.yml, looks like this:

---
Prefer_the_classic_site?_View_a_singlepage_layout.: |
  Prefer the classic site? View a single-page layout.

Providers: |
  Providers

Discover_privacycentric_online_services_including_email_providers_VPN_operators_DNS_administrator: |
  Discover privacy-centric online services, including email providers, VPN operators, DNS administrators, and more!

Learn_More: |
  Learn More

It's a different format when compared to what is currently in the i18n branch now, i.e. it has the format

string_key_id: |
  This is a source string from the site

rather than

"string_key_id": "This is a source string from the site"

If this format doesn't work with Weblate, we can change the plugin so that it generates the latter format.

A sample translation into Spanish (using deepl.com) can be found in _data/languages/es.yml:

---
Prefer_the_classic_site?_View_a_singlepage_layout.: |
  ¿Prefieres el sitio clásico? Ver un diseño de una sola página.

Providers: |
  Proveedores

Discover_privacycentric_online_services_including_email_providers_VPN_operators_DNS_administrator: |
  Descubra servicios en línea centrados en la privacidad, incluyendo proveedores de correo electrónico, operadores de VPN, administradores de DNS y mucho más!

Learn_More: |
  Aprenda Más

Known issues

  • I don't know of a way to make the plugin translate the strings of permalinks. This means that pages such as https://privacytools.io/es/donate will have to stay as /es/donate for now. Update: See comment below since this is actually not an issue.
  • Because the plugin uses the actual string as the translation ID/key, there may be collisions (e.g. "Learn more..." and "Learn more!" will have the same ID). To help remedy this, I've modified the plugin so that the IDs preserve capitalization and can also contain periods, exclamation marks, and question marks. I've also modified the plugin so that the max length for an ID is 100 characters.
  • I've had to modify the Gemfile to not use github-pages gem, so github-pages was switched with jekyll gem (which is what the current i18n branch does anyway). The jekyll-sitemap plugin also had to be explicitly added in order for the site to compile.
  • Breadcrumbs: I'm not a Ruby programmer, so I'm not going to try to make a solution for the breadcrumbs.

To reiterate, this is a proof of concept for a better i18n solution. Feel free to add this as a branch if this seems like an acceptable solution.

Adds in support for https://github.com/signalapp/jekyll-simple-i18n.

The plugin did not work with the github-pages gem, so github-pages was
switched with jekyll (which is what the current i18n branch does anyway).
* Translate more of index.html, with translations sourced from
deepl.com

* Relax the plugin to allow capitalization, periods, exclamation 
marks, and question marks in the Weblate ID for YAML src. 
This helps differentiate between similar but different strings.
@jonaharagon jonaharagon added ✨ enhancement ℹ️ help wanted WIP active work in progress, do not merge or PR (yet)! 🇦🇶 translations Anything covering a translated version of the site labels Nov 22, 2019
@netlify
Copy link

netlify bot commented Nov 22, 2019

Deploy preview for privacytools-io ready!

Built with commit e5cfd44

https://deploy-preview-1509--privacytools-io.netlify.com

Comment on lines 43 to 52
{% capture os_title %}{% t Operating Systems%}{% endcapture %}
{% capture os_page %}/{% if page.language %}{{ page.language }}/{% endif %}operating-systems/{% endcapture %}
{% capture os_description %}{% t Find out how your operating system is compromising your privacy, and what simple alternatives exist.%}{% endcapture %}

{% include card.html color="info"
title="Operating Systems"
title=os_title
icon="fas fa-desktop"
iconcolor="dark"
page="/operating-systems/"
description="Find out how your operating system is compromising your privacy, and what simple alternatives exist."
page=os_page
description=os_description
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a way to incorporate this into the card.html code instead of using captures on every page. Will have to investigate that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djoate
Copy link
Contributor

djoate commented Nov 23, 2019

@djoate
Copy link
Contributor

djoate commented Nov 24, 2019

List of some issues:

  • 404.html won't generate for a specific language, even if you key 404.html and set translate: true
  • The plugin refuses to translate "Yes" and "No" (workaround in privacytoolsIO/privacy-tools@8b5226d)

@djoate

This comment has been minimized.

@jonaharagon

This comment has been minimized.

@djoate

This comment has been minimized.

@jonaharagon

This comment has been minimized.

@jonaharagon
Copy link
Contributor Author

jonaharagon commented Nov 26, 2019

I'm not sure what's going on with Weblate, it might be a known issue (WeblateOrg/weblate#3231) or I might just need to look at it tomorrow with fresh eyes. In either case, it allows for anonymous suggestions and it works otherwise, so I'll get registrations figured out at a later time.

jonaharagon and others added 2 commits November 26, 2019 04:26
Updated by "Cleanup translation files" hook in Weblate.

Translation: PrivacyTools/Website
Translate-URL: https://weblate.nablahost.com/projects/privacytoolsio/website/
@djoate
Copy link
Contributor

djoate commented Nov 26, 2019

Seems like Weblate thinks that there's a newline at the end of every string

@djoate
Copy link
Contributor

djoate commented Nov 26, 2019

I've noticed that for regenerating the source file, it seems the best thing to do is to make sure Jekyll isn't serving the site when the file is regenerated.

e.g.,

  • if you try to delete a string while the site is still being served and you save the file, that string won't be removed from the source file until you stop serving the site and reserve/rebuild the site.
  • if you add a new {% t string %} and save it while the site is being served, the source file will only add that new string to the end of the file rather than nearby other strings (affects the Weblate nearby strings view) until you stop serving the site and reserve/rebuild the site.

@jonaharagon
Copy link
Contributor Author

jonaharagon commented Nov 26, 2019

I've noticed that for regenerating the source file, it seems the best thing to do is to make sure Jekyll isn't serving the site when the file is regenerated.

I have noticed that as well. Maybe it would be better if we did add the file to .gitignore and have a bot of some kind build the source file and push it to the repo when changes are made. But for now at least I think we can handle it easily manually.

Seems like Weblate thinks that there's a newline at the end of every string

I saw that but didn't get a chance to look further. The issue, I believe, is this space between each key:

About_PrivacyTools_18_KEY: |
  About PrivacyTools

About_the_PrivacyTools_organization_and_contributors_to_the_PrivacyTools_website_communities_and_servicesP_109_KEY: |
...

In YAML, the | means "all text until the next key" I believe, so technically the source file also has newlines at the end of each string, it just isn't as noticeable in this format, but becomes noticeable when Weblate converts it into a single-line format.

What I don't know is whether or not the \n actually affects anything. I didn't see it change anything on the site itself (and can't imagine when it would make a difference, since newlines are generally ignored in HTML). But if it does affect things removing that space between keys in the source would probably fix it.

@djoate
Copy link
Contributor

djoate commented Nov 26, 2019

What I don't know is whether or not the \n actually affects anything. I didn't see it change anything on the site itself (and can't imagine when it would make a difference, since newlines are generally ignored in HTML)

The plugin itself also does .strip on all rendered tag text. Regardless, I think leaving the newline for every string may be confusing for translators and that it would be better to try to get rid of it

@jonaharagon
Copy link
Contributor Author

At a second glance with Weblate, I'd agree:

image

@jonaharagon
Copy link
Contributor Author

I've "solved" the email problem temporarily by enabling Sign in with GitHub and GitLab. So feel free to register an account, just don't sign up with email :)

@djoate
Copy link
Contributor

djoate commented Nov 26, 2019

@jonaharagon I looked it up (https://yaml.org/YAML_for_ruby.html#three_trailing_newlines_in_literals). Apparently | would give a final new line while |- strips all newlines

* Key classic page

* Key missed strings

* Update source file
@djoate
Copy link
Contributor

djoate commented Jan 26, 2020

@jonaharagon What's the status of this PR? There was some people looking to localize the site recently (https://www.reddit.com/r/privacytoolsIO/comments/enui17/arabic_version_of_privacytoolsio/).

For things to do before this can be considered ready,

  • Localize breadcrumbs
  • There's still pull request https://github.com/privacytoolsIO/privacytools.io/pull/1535 open that needs to be merged
  • I think a policy for pull requests adding content should be made (for example, are contributors are expected to tag the new strings they make in pull requests?)
  • Maybe look into getting the self-hosted Weblate as a subdomain of privacytools.io

@Booteille
Copy link
Contributor

Hi.

Would it be possible to reopen discussions around translations?
Could we put it as a main priority for the community to do every changes needed to make translations a thing with Weblate?

This issue struggles since a long time but is, IMHO, one of the most important one.
Actually, most translations of the website are aged and people should not totally rely on it.

@djoate
Copy link
Contributor

djoate commented Dec 12, 2020

@Booteille My thoughts on the state of translations progress is given at #1106 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
✨ enhancement ℹ️ help wanted 🇦🇶 translations Anything covering a translated version of the site WIP active work in progress, do not merge or PR (yet)!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants