-
Notifications
You must be signed in to change notification settings - Fork 105
Translations #898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Translations #898
Conversation
code/templates/property.html
Outdated
<p> | ||
<aside class="aside-note"> | ||
<mark>TRANSLATION</mark> | ||
<div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do; styling similar as entity.html
|
||
<h1>{{ number }} {{ entity}}</h1> | ||
|
||
<p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do; styling similar as entity.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do so using a partial template / include, e.g partials/_translation.html
and then
<p>
{% include "partials/_translations_aside.html" %}
</p>
Some changes during review
c&p from Ghesselink#1
I'm sorry, I didn't consider that
Does this mean we'll still translate the entire html and load only the 'active' translation, or do we then just create .mo files and do the translation on request (when loading the page)? In that case, we can still cache these .mo files based on newly incoming translations in the
I'm having another look at it, but I've had some issues with loading these from the .md locally. Furthermore, this way we're completed sure that the translated text is the same as the original. However, as you mentioned, when we move away from caching based on translations and track changes in the md file too this will be redundant.
It's a bit of a mix with terminology sometimes. 'Translations' can mean a couple of things now already;
Perhaps we can better rename this to cached_translations or similar, and store the http responses in redis (?) |
I would say create all .mo files at startup and when there are changes to the translations. Most likely in poller.py I still don't really understand the load_original (and how it interacts with caching). Could you give some example values and how it's not possible to get these from the compiled .mo?
You mean the html cache? fyi caching is easy to do for ex in nginx https://docs.nginx.com/nginx/admin-guide/content-cache/content-caching/ Let me know how you think we can advance on finalizing this? |
translate.build_cache(clean=True, use_hash=True) | ||
|
||
# First time. Spider the site to build indices in Redis. Then terminate. | ||
subprocess.call([sys.executable, "translate.py", "build-cache", "--clean"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems redundant with the python cal above?
except subprocess.CalledProcessError: | ||
return b"" | ||
|
||
def update_repo(repo, branch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also change this to use pip install GitPython
?
else: | ||
translate.build_cache(use_hash=True) | ||
|
||
if trans_changed: | ||
translate.build_cache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many calls to build_cache?
code/poller.py
Outdated
if trans_changed: | ||
translate.build_cache() | ||
|
||
if not (main_changed or trans_changed or first_time): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? I think even if things have changed, it's ok to sleep?
code/translate.py
Outdated
print(f"[ERR] {po}: {e}", file=sys.stderr) | ||
|
||
def _compile_one(po, mo): | ||
os.makedirs(os.path.dirname(mo), exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it makes a difference, but maybe do this outside of the function call so that you group all the IO operations and don't repeat the same call for files sharing their directories:
for path in set(map(os.path.dirname, map(operator.itemgetter(1), tasks)):
Note that you also do the makedirs in compile_po_to_mo() so can also be removed there then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it, it's clearer to do this in one place indeed. There's not a lot of performance difference. In both cases, the initial compilation of all (621) polib files to .mo takes 8 seconds. In case everything is skipped (i.e. not new translations), it's just 0.1s.
89b4218
Initial build:
translate.py build-cache --clean
[...]
Done. compiled=621, skipped=0, pruned=0, errors=0, TRANSLATIONS_BUILD_DIR=/home/geert/Documents/translations/IFC4.3.x-development/code/compiled_translations in 8.55 seconds
Skipped:
debugpy/launcher 51103 -- /home/geert/Documents/translations/IFC4.3.x-development/code/translate.py build-cache
Done. compiled=0, skipped=621, pruned=0, errors=0, TRANSLATIONS_BUILD_DIR=/home/geert/Documents/translations/IFC4.3.x-development/code/compiled_translations in 0.10 seconds
code/translate.py
Outdated
print(f"[ERR] {po}: {e}", file=sys.stderr) | ||
|
||
else: | ||
with ThreadPoolExecutor(max_workers=jobs) as ex: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try both ThreadPoolExecutor as well as ProcessPoolExecutor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially went for ProcessPoolExecutor but then switched to ThreadPool because I couldn't get it working. Looking at it again, I understand why: the directories must be created in a central spot (as you pointed out in another comment) and the compile function must be defined outside of the build_cache function (i.e. must be pickable).
I've tried both and tested it by creating a clean cache build.
ThreadPoolExecutor
python translate.py bench --pool thread -j 8 --repeat 3
bench: pool=thread jobs=8 runs=[3.2557999299970106, 3.3374314489992685, 3.3936321290020715] avg=3.33s
ProcessPoolExecutor
python translate.py bench --pool process -j 8 --repeat 3
bench: pool=process jobs=8 runs=[0.9578737699994235, 1.102173382001638, 1.0440669560011884] avg=1.03s
and an extra double-check
python3 translate.py build-cache -j 8 --clean --pool process
Done. compiled=621, skipped=0, pruned=0, errors=0, TRANSLATIONS_BUILD_DIR=/home/geert/Documents/translations/IFC4.3.x-development/code/compiled_translations in 1.11 seconds
ProcessPoolExecutor handled the compiles three times as fast as the threaded pool. I guess that is because polib is pure python and our task is mainly CPU-bound. I've left them both in, so we could switch if we'd like to (but defaulted to process
).
0b9043d
if val: | ||
out[suffix] = val | ||
|
||
# remove wikileaks (e.g. '[[IfcBeam]]' --> 'IfcBeam') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo I guess ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, in the IFC.json it's represented like this. e.g.
"Definition": "An [[IfcBeam]] is typically a horizontal, or nearly horizontal, structural member that is capable of withstanding load primarily by resisting bending. It may also represent such a member from an architectural point of view. It is not required to be load bearing.",
This formatting is also used in the .pot files
msgid "IfcBeam_DEFINITION"
msgstr "An [[IfcBeam]] is typically a horizontal, or nearly horizontal, structural member that is capable of withstanding load primarily by resisting bending. It may also represent such a member from an architectural point of view. It is not required to be load bearing.
Because we're representing the translation at the top of the file I opted to remove the brackets. For now at least; in the near future it's probably better to put the translation inside the semantic definition part, and we can keep the brackets to include the link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to wikileaks instead of links :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh woops .. I didn't even notice that :p
|
||
def list_languages(): | ||
# get a list of available languages | ||
langs = build_language_file_map() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build_language_file_map()
seems to get called an awful lot, but does a quite elaborate directory search. Maybe cache it and compute it once per minute. I think that's better than creating a JSON file from it using poller.py, because also that JSON file would need to be read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added two command line arguments to test it. e.g.
(translations) geert@PcGeert:~/Documents/translations/IFC4.3.x-development/code$ python3 translate.py debug-ttl
TTL=60s
after first run: {'lang_map': 1, 'flag_map': 1, 'list_langs': 1}
after second run: {'lang_map': 2, 'flag_map': 2, 'list_langs': 2} # after time.sleep(60.5s)
The average times per call (very little ..), called each map function 2000 times to test it.
(translations) geert@PcGeert:~/Documents/translations/IFC4.3.x-development/code$ python3 translate.py bench-ttl
cold: map=4.886 ms flag=0.081 ms list=0.022 ms
hot (cached avg): map=0.70 µs flag=0.72 µs list=0.97 µs
speedup×: map≈7014 flag≈112 list≈23
Brief summary of current structure;
The script compiles .po files into .mo files and stores them in the cache, but only if the cache is not already populated. Otherwise, the translations are retrieved directly from the cache and made available on the server, with the language preference stored in a cookie on the frontend and translations passed to HTML via (server.py) render_template.
Todo