layout	title	nav_order	parent
default	Taskcluster	1	Orchestrators

Taskcluster

Taskcluster is a Mozilla task execution framework. It powers Firefox CI and provides access to the hybrid cloud workers (GCP or on-prem) which increases scalability and observability compared to Snakemake.

We use Taskcluster taskgraph to define the DAG (Directly Acyclic Graph) of the pipeline steps.

Running training

Create a new branch in the git repo and push. It is useful to experiment with code and also not to get the caches invalidated if you need to restart training and some new changes were landed in the main branch.
Go to Github CI for the commit you want to run training for and find a Decision Task

Go to CI and press "View task in Taskcluster". Make sure you are authenticated in the TC interface. It is required to run tasks. However, already running tasks can be viewed without authentication.

In TC interface navigate to a parent Task Group

Press "Train" in the 3-dot menu for actions

Copy a config prepared in advance and press "train". See the example TC config here. You can find directions on how to configure training in the Model training guide.

Checking the status of training

Look at the scheduled tasks. They should be visible under the Train action.

Press any task. Here you can look at the logs and artifacts produced by the task.

Navigate to a parent Task Group again (it is a different one than for the Train Action). Here you can see all the scheduled tasks in a more convenient interface with filtering.

Rerunning

Quite often you need to rerun the pipeline after making fixes or when a task fails.

It is possible to manually cancel a task with the Cancel task action.

After the fixes were implemented, push again and restart the pipeline with the same procedure as described in the "Running training" section.

Caching

Some steps might be already cached from the previous run depending on the fixes. For example if only a config setting that affects the last task was changed, or if nothing changed at all the pipeline might restart from the failed/cancelled step.

Warning: even a slight refactoring of the upstream steps can invalidate caches for the whole pipeline completely, so it's better to be careful with that when experimenting with the later stages of the pipeleine.

Running up to a specific step

Change target-stage: all in the training config to a stage that corresponds to another TC step. For example, to download, clean and merge the training corpus use:

target-stage: merge-corpus

that corresponds to stage: merge-corpus in /taskcluster/ci/merge-corpus/kind.yml:

tasks:
    merge-corpus:
        label: merge-corpus-{src_locale}-{trg_locale}
        description: merge corpus for {src_locale}-{trg_locale}
        attributes:
            dataset-category: train
            stage: merge-corpus

Interactive Tasks

Taskcluster allows authorized users to run so-called interactive tasks. These tasks allow users to gain a shell in the same environment that a pipeline step runs in. This can often be useful for quicker debugging or testing of ideas.

To start an interactive task, follow these steps:

Go to the task you want an interactive version of, eg: https://firefox-ci-tc.services.mozilla.com/tasks/DZvVQ-VUTPSyPBBS13Bwfg
Click the "Edit" button in the three dots menu
Click "Edit" on the modal that pops up
Click the "Interactive" toggle in the top left
Reduce the maxRunTime to a best guess at how long you'll need the task and worker running for. (We pay for every minute a worker runs - so they should not be kept running, eg: overnight.)
Adjust the payload to simply run bash and sleep (instead of a full pipeline step). For docker-worker tasks use something like:

     command:
    - bash
    - '-c'
    - 'sleep 7200'

For generic-worker tasks (those needing a GPU), use:

     command:
    - - bash
      - '-c'
      - 'sleep 7200'

(docker-worker tasks have an image section in the payload)

Click "Create Task"

After a few minutes you should be able to get a shell (a link will show up in the tab when it's ready).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

task-cluster.md

task-cluster.md

Taskcluster

Running training

Checking the status of training

Rerunning

Caching

Running up to a specific step

Interactive Tasks

Files

task-cluster.md

Latest commit

History

task-cluster.md

File metadata and controls

Taskcluster

Running training

Checking the status of training

Rerunning

Caching

Running up to a specific step

Interactive Tasks