`[ci]` Simplify tests, add CI, patch `paraphrase_mining_embeddings` #2350

tomaarsen · 2023-11-07T14:23:09Z

Hello!

Pull Request overview

Simplify tests: only train with 100-500 samples, only test with 100 samples.
Add CI, automated tests for:
- os: Linux, Windows
- Python version: 3.7, 3.8, 3.9, 3.10, 3.11
Patch paraphrase_mining_embeddings such that i < j is always true with i and j as paraphrase indices.

Details

Tests

Before CI can be applied, we need to simplify the tests. In essence, this involved first verifying that the there is no regression on master by running the current (large) tests. Then, I decreased the number of testing samples to 100 across all tests, and the number of training samples down to 100-500, depending on the test setup.

I then updated the expected scores across all pretrained models. Ideally, I think it would be best to test less of the pretrained models, as it allows us to actually cache the downloaded models, but alas, I think this should be fine too: the GitHub CI environment can usually download at ~250MB/s.

In a later PR I intend to improve the test suite in various ways. For example:

No longer load the STSB/NLI files in numerous locations, but only once.
Heavily improve the test coverage. Currently SentenceTransformers has a dreadful 46% test coverage, whereas 95% is commonly the target for large OS projects.

Test time seems to be between 6 and 15 minutes. The test time might improve further when #2345 is merged, in case we currently download unnecessary model files (e.g. model.safetensors and pytorch_model.bin).

CI

I've added a simple continuous integration setup. CI is extremely important for large OS projects, and it should help simplify PRs etc. a ton. The CI caches Python dependencies, but not HF models. The reason is that the test suite simply uses too many large models that they cannot all be cached across all OS-es.

After v2.3.0 I intend to deprecate Python 3.6 and 3.7. I would have added Python 3.6 to the CI, but it did not work: The Ubuntu 22.04 GitHub CI image doesn't have Python 3.6 anymore, and for Windows it isn't possible to install a recent enough tokenizers version from scratch.
Although 3.12 has been out for quite some time, the torch support is still lacking, so that's why it was not included in the CI.

Patch `paraphrase_mining_embeddings`

When running the CI, I learned that for Python 3.10 and 3.11, on Linux only, the paraphrase_mining_embeddings function could return (score, i, j) triples where i > j. This does not occur in other versions. I applied a simple patch to ensure that i < j always holds, allowing the matching test to pass.

Run the (fast) tests with pytest, and the slow tests with pytest -m slow.

Tom Aarsen

I'll fully deprecate Python 3.7 soon too

Ensure that i < j always holds true

They're simply too big when combined.

But now based on 100 test samples instead

These kinds of tests can be a bit flimsy, this should help

tests/test_cross_encoder.py

tomaarsen added 11 commits November 7, 2023 12:54

Simplify tests, cheaper to run

ea7ea51

Add simple CI

bfdcf20

Also install pytest

c1473cf

Remove Python 3.12, not supported with torch yet

5a28e81

No CI for Python 3.6 - too deprecated

ef6db32

I'll fully deprecate Python 3.7 soon too

Apply patch to paraphrase_mining_embeddings

8293b5c

Ensure that i < j always holds true

Stop trying to cache loaded Models

d6f8c34

They're simply too big when combined.

Increase lower bound of pretrained scores

2db0b9f

But now based on 100 test samples instead

Prevent duplicate ST install

44a23f5

Install requirements.txt within the cached section

b49c772

Relax lower bound on training test

41ad475

These kinds of tests can be a bit flimsy, this should help

nikitajz reviewed Nov 30, 2023

View reviewed changes

tests/test_cross_encoder.py Outdated Show resolved Hide resolved

Reintroduce slow tests, can be ran with 'pytest -m slow'

b6553e8

tomaarsen merged commit 5da2188 into UKPLab:master Dec 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`[ci]` Simplify tests, add CI, patch `paraphrase_mining_embeddings` #2350

`[ci]` Simplify tests, add CI, patch `paraphrase_mining_embeddings` #2350

tomaarsen commented Nov 7, 2023 •

edited

Loading

[ci] Simplify tests, add CI, patch paraphrase_mining_embeddings #2350

[ci] Simplify tests, add CI, patch paraphrase_mining_embeddings #2350

Conversation

tomaarsen commented Nov 7, 2023 • edited Loading

Pull Request overview

Details

Tests

CI

Patch paraphrase_mining_embeddings

`[ci]` Simplify tests, add CI, patch `paraphrase_mining_embeddings` #2350

`[ci]` Simplify tests, add CI, patch `paraphrase_mining_embeddings` #2350

tomaarsen commented Nov 7, 2023 •

edited

Loading

Patch `paraphrase_mining_embeddings`