Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] Simplify tests, add CI, patch paraphrase_mining_embeddings #2350

Merged
merged 12 commits into from
Dec 12, 2023

Conversation

tomaarsen
Copy link
Collaborator

@tomaarsen tomaarsen commented Nov 7, 2023

Hello!

Pull Request overview

  • Simplify tests: only train with 100-500 samples, only test with 100 samples.
  • Add CI, automated tests for:
    • os: Linux, Windows
    • Python version: 3.7, 3.8, 3.9, 3.10, 3.11
  • Patch paraphrase_mining_embeddings such that i < j is always true with i and j as paraphrase indices.

Details

Tests

Before CI can be applied, we need to simplify the tests. In essence, this involved first verifying that the there is no regression on master by running the current (large) tests. Then, I decreased the number of testing samples to 100 across all tests, and the number of training samples down to 100-500, depending on the test setup.

I then updated the expected scores across all pretrained models. Ideally, I think it would be best to test less of the pretrained models, as it allows us to actually cache the downloaded models, but alas, I think this should be fine too: the GitHub CI environment can usually download at ~250MB/s.

In a later PR I intend to improve the test suite in various ways. For example:

  1. No longer load the STSB/NLI files in numerous locations, but only once.
  2. Heavily improve the test coverage. Currently SentenceTransformers has a dreadful 46% test coverage, whereas 95% is commonly the target for large OS projects.

Test time seems to be between 6 and 15 minutes. The test time might improve further when #2345 is merged, in case we currently download unnecessary model files (e.g. model.safetensors and pytorch_model.bin).

CI

I've added a simple continuous integration setup. CI is extremely important for large OS projects, and it should help simplify PRs etc. a ton. The CI caches Python dependencies, but not HF models. The reason is that the test suite simply uses too many large models that they cannot all be cached across all OS-es.

After v2.3.0 I intend to deprecate Python 3.6 and 3.7. I would have added Python 3.6 to the CI, but it did not work: The Ubuntu 22.04 GitHub CI image doesn't have Python 3.6 anymore, and for Windows it isn't possible to install a recent enough tokenizers version from scratch.
Although 3.12 has been out for quite some time, the torch support is still lacking, so that's why it was not included in the CI.

Patch paraphrase_mining_embeddings

When running the CI, I learned that for Python 3.10 and 3.11, on Linux only, the paraphrase_mining_embeddings function could return (score, i, j) triples where i > j. This does not occur in other versions. I applied a simple patch to ensure that i < j always holds, allowing the matching test to pass.


Run the (fast) tests with pytest, and the slow tests with pytest -m slow.

  • Tom Aarsen

@tomaarsen tomaarsen merged commit 5da2188 into UKPLab:master Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants