-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always use Bicleaner AI #367
Conversation
taskcluster/translations_taskgraph/transforms/find_upstreams.py
Outdated
Show resolved
Hide resolved
@@ -12,18 +12,10 @@ transforms: | |||
- translations_taskgraph.transforms.cached_tasks:transforms | |||
- taskgraph.transforms.task:transforms | |||
|
|||
# todo: why bicleaner and clean-corpus are required here? it doesn't work without them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the issue?
I bet @bhearsum or @gabrielBusta can help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to know more about what doesn't work to say anything with certainty, but in general you need to have kind-dependencies
set if any tasks in this kind are generated from another kind.
(I am a little bit surprised that clean-corpus
is required here....that looks like it may be an historical artifact from when all of the merge
tasks were in the same kind...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merging devest does not depend on cleaning or bicleaner. We merge raw datasets here. When I remove those two and validate the taskgraph it says:
2024-01-22 15:52:17,682 - ERROR - Error loading tasks for kind merge-devset:
Traceback (most recent call last):
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/generator.py", line 311, in _run
new_tasks = kind.load_tasks(
^^^^^^^^^^^^^^^^
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/generator.py", line 76, in load_tasks
tasks = [
^
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/generator.py", line 76, in <listcomp>
tasks = [
^
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 1347, in check_run_task_caches
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 1276, in check_task_dependencies
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 1262, in check_task_identifiers
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 1243, in chain_of_trust
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 1236, in add_github_checks
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 1061, in build_task
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 1029, in add_index_routes
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 957, in process_treeherder_metadata
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 919, in validate
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 907, in task_name_from_label
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 863, in set_defaults
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 842, in set_implementation
for task in tasks:
File "/Users/epavlov/mozilla/source/origin/firefox-translations-training/taskcluster/translations_taskgraph/transforms/cached_tasks.py", line 117, in cache_task
for task in order_tasks(config, tasks):
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/cached_tasks.py", line 22, in order_tasks
pending = deque(tasks)
^^^^^^^^^^^^
File "/Users/epavlov/mozilla/source/origin/firefox-translations-training/taskcluster/translations_taskgraph/transforms/cached_tasks.py", line 69, in add_cache
for job in jobs:
File "/Users/epavlov/mozilla/source/origin/firefox-translations-training/taskcluster/translations_taskgraph/transforms/cached_tasks.py", line 55, in resolved_keyed_by_fields
for job in jobs:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/base.py", line 144, in __call__
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/run/__init__.py", line 361, in make_task_description
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/run/__init__.py", line 236, in use_fetches
for task in order_tasks(config, tasks):
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/cached_tasks.py", line 22, in order_tasks
pending = deque(tasks)
^^^^^^^^^^^^
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/run/__init__.py", line 164, in add_resource_monitor
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/run/__init__.py", line 152, in set_label
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/run/__init__.py", line 136, in set_implementation
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/run/__init__.py", line 116, in rewrite_when_to_optimization
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/base.py", line 144, in __call__
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task_context.py", line 85, in render_task
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/base.py", line 144, in __call__
for task in tasks:
File "/Users/epavlov/mozilla/source/origin/firefox-translations-training/taskcluster/translations_taskgraph/transforms/find_upstreams.py", line 97, in upstreams_for_locales
for job in jobs:
File "/Users/epavlov/mozilla/source/origin/firefox-translations-training/taskcluster/translations_taskgraph/transforms/find_upstreams.py", line 82, in resolve_keyed_by_fields
cleaning_type = get_cleaning_type(config.kind_dependencies_tasks.values())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/epavlov/mozilla/source/origin/firefox-translations-training/taskcluster/translations_taskgraph/transforms/find_upstreams.py", line 75, in get_cleaning_type
raise Exception("Unable to find cleaning type!")
Exception: Unable to find cleaning type!
Traceback (most recent call last):
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/main.py", line 907, in main
return args.command(vars(args))
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/main.py", line 430, in show_taskgraph
ret = generate_taskgraph(options, parameters, logdir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/main.py", line 192, in generate_taskgraph
out = format_taskgraph(options, spec, logfile(spec))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/main.py", line 148, in format_taskgraph
tg = getattr(tgg, options["graph_attr"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/generator.py", line 170, in full_task_graph
return self._run_until("full_task_graph")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/generator.py", line 425, in _run_until
k, v = next(self._run)
^^^^^^^^^^^^^^^
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/generator.py", line 311, in _run
new_tasks = kind.load_tasks(
^^^^^^^^^^^^^^^^
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/generator.py", line 76, in load_tasks
tasks = [
^
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/generator.py", line 76, in <listcomp>
tasks = [
^
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 1347, in check_run_task_caches
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 1276, in check_task_dependencies
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 1262, in check_task_identifiers
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 1243, in chain_of_trust
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 1236, in add_github_checks
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 1061, in build_task
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 1029, in add_index_routes
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 957, in process_treeherder_metadata
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 919, in validate
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 907, in task_name_from_label
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 863, in set_defaults
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task.py", line 842, in set_implementation
for task in tasks:
File "/Users/epavlov/mozilla/source/origin/firefox-translations-training/taskcluster/translations_taskgraph/transforms/cached_tasks.py", line 117, in cache_task
for task in order_tasks(config, tasks):
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/cached_tasks.py", line 22, in order_tasks
pending = deque(tasks)
^^^^^^^^^^^^
File "/Users/epavlov/mozilla/source/origin/firefox-translations-training/taskcluster/translations_taskgraph/transforms/cached_tasks.py", line 69, in add_cache
for job in jobs:
File "/Users/epavlov/mozilla/source/origin/firefox-translations-training/taskcluster/translations_taskgraph/transforms/cached_tasks.py", line 55, in resolved_keyed_by_fields
for job in jobs:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/base.py", line 144, in __call__
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/run/__init__.py", line 361, in make_task_description
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/run/__init__.py", line 236, in use_fetches
for task in order_tasks(config, tasks):
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/cached_tasks.py", line 22, in order_tasks
pending = deque(tasks)
^^^^^^^^^^^^
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/run/__init__.py", line 164, in add_resource_monitor
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/run/__init__.py", line 152, in set_label
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/run/__init__.py", line 136, in set_implementation
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/run/__init__.py", line 116, in rewrite_when_to_optimization
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/base.py", line 144, in __call__
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/task_context.py", line 85, in render_task
for task in tasks:
File "/Users/epavlov/Library/Caches/pypoetry/virtualenvs/firefox-translations-training-U0HVfMmK-py3.11/lib/python3.11/site-packages/taskgraph/transforms/base.py", line 144, in __call__
for task in tasks:
File "/Users/epavlov/mozilla/source/origin/firefox-translations-training/taskcluster/translations_taskgraph/transforms/find_upstreams.py", line 97, in upstreams_for_locales
for job in jobs:
File "/Users/epavlov/mozilla/source/origin/firefox-translations-training/taskcluster/translations_taskgraph/transforms/find_upstreams.py", line 82, in resolve_keyed_by_fields
cleaning_type = get_cleaning_type(config.kind_dependencies_tasks.values())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/epavlov/mozilla/source/origin/firefox-translations-training/taskcluster/translations_taskgraph/transforms/find_upstreams.py", line 75, in get_cleaning_type
raise Exception("Unable to find cleaning type!")
Exception: Unable to find cleaning type!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this issue is not related to this PR though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Yep - that's an oversight from when merge-devset
moved to its own kind. #385 fixes it.
taskcluster/translations_taskgraph/transforms/find_upstreams.py
Outdated
Show resolved
Hide resolved
I see a downside in the approach. Previously we had all models listed. It was a bit painful when you wanted to add a new one, but it allowed us to specify exactly which model version we wanted to download and so it allowed us to make use of caching effectively (because we knew when to invalidate things). If we want to accept the downside in exchange for avoiding to list all models, we should probably document the decision somewhere. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm marking this as Request changes due to the note below about the bicleaner model cache. Hopefully I'm understanding the intent there - please feel free to ping me to discuss.
todo: With non-ai bicleaner going away, the bicleaner
in https://github.com/mozilla/firefox-translations-training/blob/71013bcea0e4647d04d508daf45fe2a96c27ef0d/taskcluster/translations_taskgraph/transforms/find_upstreams.py#L71 is now dead code and can be removed.
@@ -12,18 +12,10 @@ transforms: | |||
- translations_taskgraph.transforms.cached_tasks:transforms | |||
- taskgraph.transforms.task:transforms | |||
|
|||
# todo: why bicleaner and clean-corpus are required here? it doesn't work without them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to know more about what doesn't work to say anything with certainty, but in general you need to have kind-dependencies
set if any tasks in this kind are generated from another kind.
(I am a little bit surprised that clean-corpus
is required here....that looks like it may be an historical artifact from when all of the merge
tasks were in the same kind...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a few small pieces of feedback looking through this, but others have done a more thorough review, so I'm just leaving a comment.
All good points! The idea was to delegate downloading to bicleaner and easily switch between the full and lite models + hide multilingual one to reduce complexity on Taskcluster side. I looked at the code and you're right, it does download latest which is not ideal, but there was just one release of the models so I don't think this will become a problem for us. Paracrawl project is over and I don't know whether there are any plans to retrain the models or train new languages. Let's keep an eye on this. As for caching, we can just set expiration for the task (currently 90 days). And anyway we're likely always interested in the latest models as soon as they are compatible with the code. When/if they release something new they'll likely bump bicleaner-ai version and when we update it the caches will get invalidated. I assume they'll change the URLs and HF repo names if they ever train new incompatible models. I think the language set was like this for two years and it's unlikely it will change. We download fastText model in the same way, now with OpusCleaner. We also download tons of datasets every time which also might get updated and we don't cache them well in the same way. We can check but I don't think downloading is a huge cost for us compared to GPUs. I agree in general that it would be nice to make all the data/model downloading more transparent but doing downloading dynamically we increase flexibility and reduce complexity. I'll add docs about this. |
My concerns are more with reproducibility than cost in this case, but if we don't expect the models to change often and we document this well, it seems OK. |
I see. I documented all this and how to invalidate the caches manually in the code and in the doc under Bicleaner -> ## Models and caching. It is a bit of a crutch but let's get back to it if we start having problems. I don't like the alternative of keeping all this logic on Takscluster since it's really hard to maintain, so sacrificing transparency here seems like the lesser of two evils. |
Ok, I think I addressed most of the issues now. Caching is not ideal but let's see how it goes. The developers might never release a new model for this tool. We can always manually invalidate caches if needed and rethink this approach if it becomes a problem. |
Ci for train-teacher:
it looks like some temporary failure of git clone. The bicleaner steps has passed. I don't want to delay merging, so I'll rerun full CI on the next PR |
Fixes #55 #360