Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a run_task abstraction for eval-tests.py #393

Merged
merged 4 commits into from
Jan 30, 2024

Conversation

gregtatum
Copy link
Member

@gregtatum gregtatum commented Jan 24, 2024

Instead of testing at the script level, this tests at the run-task level, which I feel is the integration test level.

    run_task(
        task_name="evaluate-quantized-sacrebleu-wmt09-en-ru",
        script="pipeline/eval/eval-quantized.sh",
        work_dir=test_data_dir.path,
        fetches_dir=test_data_dir.path,
        env={
            # This is where the marian_decoder_args will get stored.
            "TEST_ARTIFACTS": test_data_dir.path,
            # Replace marian with the one in the fixtures path.
            "BMT_MARIAN": fixtures_path,
            # This is included via the poetry install
            "COMPRESSION_CMD": "zstd",
        },
    )

I'm keeping this as draft because it contains work from #364 that I would like to land first. I also wanted to get a CI run in. The most relevant commit here is ecd2793

Edit: This is ready for review.

tests/test_eval.py Outdated Show resolved Hide resolved
@gregtatum gregtatum force-pushed the eval-tests-run-task branch 2 times, most recently from d8cf127 to 15adb9a Compare January 25, 2024 15:53
@gregtatum gregtatum marked this pull request as ready for review January 25, 2024 16:01
@gregtatum gregtatum requested a review from eu9ene January 25, 2024 16:01
@gregtatum
Copy link
Member Author

So this test was really useful in #401 rewriting eval.sh to eval.py and simplifying the task graph, as I could re-run the test to make sure I wasn't breaking things.

@gregtatum
Copy link
Member Author

gregtatum commented Jan 29, 2024

Here is an example of an existing test using run_task that I'll put up for review later:

@pytest.mark.parametrize(
    "params",
    [
        ("mtdata", "Neulab-tedtalks_test-1-eng-rus"),
        ("opus", "ELRC-3075-wikipedia_health_v1"),
        ("flores", "dev"),
        ("sacrebleu", "wmt19"),
    ],
)
def test_basic_corpus_import(params, data_dir):
    importer, dataset = params

    data_dir.run_task(
        f"dataset-{importer}-{dataset}-en-ru",
        env={"COMPRESSION_CMD": COMPRESSION_CMD, "ARTIFACT_EXT": ARTIFACT_EXT},
    )

    prefix = data_dir.join(f"artifacts/{dataset}")
    output_src = f"{prefix}.ru.gz"
    output_trg = f"{prefix}.en.gz"

    assert os.path.exists(output_src)
    assert os.path.exists(output_trg)
    assert len(read_lines(output_src)) > 0
    assert len(read_lines(output_trg)) > 0

This fixes an issue where if you modify the tc.prod.yml it will break
the taskcluster tests. It will also allow for tests to share the same
config between runs and not have a dirty artifacts folder.
@gregtatum gregtatum force-pushed the eval-tests-run-task branch from 15adb9a to d337e1f Compare January 29, 2024 20:15
Copy link
Collaborator

@eu9ene eu9ene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! I have a minor remark about asserts. Running tasks is a great way to integrate a bit deeper and improve coverage.

tests/test_eval.py Outdated Show resolved Hide resolved
@gregtatum gregtatum merged commit cb4231e into mozilla:main Jan 30, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants