Use a run_task abstraction for eval-tests.py #393

gregtatum · 2024-01-24T22:32:55Z

Instead of testing at the script level, this tests at the run-task level, which I feel is the integration test level.

    run_task(
        task_name="evaluate-quantized-sacrebleu-wmt09-en-ru",
        script="pipeline/eval/eval-quantized.sh",
        work_dir=test_data_dir.path,
        fetches_dir=test_data_dir.path,
        env={
            # This is where the marian_decoder_args will get stored.
            "TEST_ARTIFACTS": test_data_dir.path,
            # Replace marian with the one in the fixtures path.
            "BMT_MARIAN": fixtures_path,
            # This is included via the poetry install
            "COMPRESSION_CMD": "zstd",
        },
    )

~~I'm keeping this as draft because it contains work from #364 that I would like to land first. I also wanted to get a CI run in. The most relevant commit here is ecd2793~~

Edit: This is ready for review.

tests/test_eval.py

gregtatum · 2024-01-26T20:42:18Z

So this test was really useful in #401 rewriting eval.sh to eval.py and simplifying the task graph, as I could re-run the test to make sure I wasn't breaking things.

gregtatum · 2024-01-29T16:36:23Z

Here is an example of an existing test using run_task that I'll put up for review later:

@pytest.mark.parametrize(
    "params",
    [
        ("mtdata", "Neulab-tedtalks_test-1-eng-rus"),
        ("opus", "ELRC-3075-wikipedia_health_v1"),
        ("flores", "dev"),
        ("sacrebleu", "wmt19"),
    ],
)
def test_basic_corpus_import(params, data_dir):
    importer, dataset = params

    data_dir.run_task(
        f"dataset-{importer}-{dataset}-en-ru",
        env={"COMPRESSION_CMD": COMPRESSION_CMD, "ARTIFACT_EXT": ARTIFACT_EXT},
    )

    prefix = data_dir.join(f"artifacts/{dataset}")
    output_src = f"{prefix}.ru.gz"
    output_trg = f"{prefix}.en.gz"

    assert os.path.exists(output_src)
    assert os.path.exists(output_trg)
    assert len(read_lines(output_src)) > 0
    assert len(read_lines(output_trg)) > 0

This fixes an issue where if you modify the tc.prod.yml it will break the taskcluster tests. It will also allow for tests to share the same config between runs and not have a dirty artifacts folder.

eu9ene

Looks good overall! I have a minor remark about asserts. Running tasks is a great way to integrate a bit deeper and improve coverage.

tests/test_eval.py

gregtatum commented Jan 24, 2024

View reviewed changes

tests/test_eval.py Outdated Show resolved Hide resolved

gregtatum mentioned this pull request Jan 24, 2024

Add tests for evaluation #364

Merged

gregtatum force-pushed the eval-tests-run-task branch 2 times, most recently from d8cf127 to 15adb9a Compare January 25, 2024 15:53

gregtatum marked this pull request as ready for review January 25, 2024 16:01

gregtatum requested a review from eu9ene January 25, 2024 16:01

gregtatum mentioned this pull request Jan 26, 2024

Rewrite eval.sh to python and add a json output for evaluation metrics #401

Merged

gregtatum added 3 commits January 29, 2024 11:42

Change ca to ru in the test assertions

9e37ea5

Use a test defined config file

14044b5

This fixes an issue where if you modify the tc.prod.yml it will break the taskcluster tests. It will also allow for tests to share the same config between runs and not have a dirty artifacts folder.

Add a run_task test utility that excercises the full taskgraph

d337e1f

gregtatum force-pushed the eval-tests-run-task branch from 15adb9a to d337e1f Compare January 29, 2024 20:15

eu9ene approved these changes Jan 29, 2024

View reviewed changes

tests/test_eval.py Outdated Show resolved Hide resolved

Parameterize the eval test

3019f19

gregtatum merged commit cb4231e into mozilla:main Jan 30, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a run_task abstraction for eval-tests.py #393

Use a run_task abstraction for eval-tests.py #393

gregtatum commented Jan 24, 2024 •

edited

Loading

gregtatum commented Jan 26, 2024

gregtatum commented Jan 29, 2024 •

edited

Loading

eu9ene left a comment

Use a run_task abstraction for eval-tests.py #393

Use a run_task abstraction for eval-tests.py #393

Conversation

gregtatum commented Jan 24, 2024 • edited Loading

gregtatum commented Jan 26, 2024

gregtatum commented Jan 29, 2024 • edited Loading

eu9ene left a comment

Choose a reason for hiding this comment

gregtatum commented Jan 24, 2024 •

edited

Loading

gregtatum commented Jan 29, 2024 •

edited

Loading