build(medcat): CU-869aujr7h Add nightly workflow to check library stability #171

mart-r · 2025-10-14T10:20:45Z

Currently, most of our workflows run based on the dependencies defined in uv.lock. This is great for reproducibility, stabiliy, as well as speed of installation. However, it means that we're not checking the library against any of the new versions of our dependencies that come up over time. What this means is that even if our workflows run successfully, we don't have any confidence that a user installing the project will be able to actually use the project with the installed dependencies.

So this PR introduces a new - nightly - workflow that ignores the lock file and installs the latest available dependencies and runs some typical tests against those. The idea is to use this to catch any dependencies that have updates and become uncomatible for any reason.

As an example, transformers==4.57.0 was released on 3rd of October. And it - inadvertently - dropped support for python 3.9. And because our workflows were not testing against this version, they were all passing. Yet a user installing after that date would have installed the version of transformers and found them to be incompatible only at run time.

The idea is that as a result of this PR failing, we can take action to identify the incompatible dependency and either patch the compatibility issue or release a patch version with an explicitly disallowed incompatible version.

The reason we don't always (i.e in all worklows) test against the latest dependencies is so that we can separate the compatibility of a change from a PR (what existing workflows handle) from the stability of the overall library (what this new workflow handled). After all, we don't want to force a contributer to make drastic changes to dependencies and / or patch incompatibility issues unrelated to their change just to make the workflows run.

What the workflow does overall:

Runs every night at 3am
Runs over all our supported python versions (~~3.9~~, 3.10, 3.11, 3.12, 3.13)
Runs over a number of different OS targets (ubuntu, macOs, Windows)
Runs linting and typing
Runs tests
Runs regression test suite

There is a small caveat for MacOS runner:

It ignores some DeID tests
- The MacOS runner only has 7GB of RAM, and these would normally tip it over the edge
- So currently they're just ignored on that (and only that) platform and only in CI
- There's an extra 8 tests (i.e parts of a TestCase) that are ignored on MacOS

What it omits from the regular workflow is the backwards compatibility check. Though we may want to include that as well. I just felt like not doing all of it since this is going to be running on 12 different runners already (4 versions and 3 OSs).

tomolopolis · 2025-10-14T10:20:49Z

Task linked: CU-869aujr7h Add workflow to check for library stability (i.e ignore lock at workflow)

.github/workflows/medcat-v2-lib-stabiliy.yml

…ity-workflow

alhendrickson

lgtm

alhendrickson · 2025-10-14T12:16:57Z

.github/workflows/medcat-v2-lib-stabiliy.yml

+
+      - name: Install with latest deps
+        run: |
+          uv run --python ${{ matrix.python-version }} python -m ensurepip


Hey as wildcard one for another time (if ever)

This one reads the pyproject file and the python in this folder right? So it kind of says "the github project is still working based on what is in main"

Are we able to go a layer higher and do something extra to say "the pypi library is working". Or in other words really test for "Can users probably install and use Medcat today?"

uv pip install medcat # So actually bring in the latest pypi version uv run some-easy-test

With some super easy test like

x = CAT.load_model_pack(" ") result = x.get_entities("test") assert whatever

Yes, this reads the dependencies for pyproject.toml and installs stuff based on that.

We could add something to test PyPI based install as well. But I'm not sure it adds a lot. It sounds like testing PyPI's infrastucture and/or wheel mechanics at that point. And I feel like that's someone else's responsibility.

Perhaps a better option for this would be an isolated test in the production / release workflow? Something tests that installation of the wheel does in fact work as expected.

I think I'm more coming from the perspective of what is actually being verified:

Right now there's nothing that checks for the case of "I followed your readme instructions today and it worked". This test as is here starts with "I checked out your main branch... and ran unit tests" which isn't quite the same thing

Interestingly, back when it was pinned versions it did actually test for it in a way - at least the tutorial/service tests verified it I think. Maybe this change is really where the gap was introduced, and the uv.lock part has confused it.

As a suggestion - if alongside your test here we also made tutorials run nightly but from the pinned version, does it totally solve for the uv.lock concern, as well as be the full user facing test?

…python 3.9

…ity-workflow

…ssues

…n Windows

…ity-workflow

… 3.10. This commit TEMPORARILY (while the workflows are failing) makes them only run on Windows and MacOS (which are the workflows that are failing) and on python 3.10 so as to lower the overall number of workflow runners.

… out

…Windows compatibility

…ndows on 3.10." This reverts commit f201270.

alhendrickson

Looks great, would be good to see what happens at 3am tonight

alhendrickson · 2025-10-14T13:02:12Z

.github/workflows/medcat-v2-lib-stabiliy.yml

+
+      - name: Install with latest deps
+        run: |
+          uv run --python ${{ matrix.python-version }} python -m ensurepip


I think I'm more coming from the perspective of what is actually being verified:

Right now there's nothing that checks for the case of "I followed your readme instructions today and it worked". This test as is here starts with "I checked out your main branch... and ran unit tests" which isn't quite the same thing

Interestingly, back when it was pinned versions it did actually test for it in a way - at least the tutorial/service tests verified it I think. Maybe this change is really where the gap was introduced, and the uv.lock part has confused it.

As a suggestion - if alongside your test here we also made tutorials run nightly but from the pinned version, does it totally solve for the uv.lock concern, as well as be the full user facing test?

alhendrickson · 2025-10-16T11:46:15Z

.github/workflows/medcat-v2-lib-stabiliy.yml

+  contents: read
+on:
+  schedule:
+    - cron: "0 3 * * *"  # every day at 3am UTC


Happy to try this for now at 3am to see what it is like. Personally might suggest doing it during working hours? Or just before 9am anyway. Trade off between when having the email come in is useful, vs slowing down other builds...

The reason I wanted it overnight is so that it doesn't disturb other work. This spawns 4x3=12 workflows, all of which take quite a while (20-30 minutes). And if this were to happen during work hours, it migth cause workflows for active work to be queued. So I figured I'd run it at night - when no other work is happening - and deal with it in the morning if I have to.

alhendrickson · 2025-10-16T11:54:27Z

.github/workflows/medcat-v2-lib-stabiliy.yml

+    strategy:
+      fail-fast: false
+      matrix:
+        os: [ubuntu-latest, macos-latest, windows-latest]


From a design point of view:

I'm thinking in general you can make this as close to the _main one as possible? Keeping it simple and easy to maintain/

(Great to try all the other OS's just in this not main though - so this is one diff that is good)

But taking this example for the Test step, can we make them completely identical? Either change main or this one - as I dont want to have to think about if timeout-minutes is better than timeout (as an example of keeping it simple)

In here:

- name: Test run: | uv run --python ${{ matrix.python-version }} python -m unittest discover timeout-minutes: 45

Vs in main

- name: Test run: | timeout 30m uv run python -m unittest discover

The only reason the specific example was changed was because that didn't work for the Windows runner since it didn't have the timeout command available. And I didn't think it was within the scope here to change the main workflow to be in line with this change.

From a brief look we should be able to fully modularise this. I.e have the normal workflow steps (linting, typing, tests) in a separate workflow file (that never runs on its own) and we use it for both the main workflow and this one, but with slightly different inputs. I think it's probably worth doing. But I'd leave it for another task.

alhendrickson · 2025-10-16T11:56:28Z

medcat-v2/tests/utils/ner/test_deid.py



+@unittest.skipIf(not should_do_test_ci(),
+                 "MacOS on workflow doesn't have enough memory")


So minor - could you alternatively find if there is enough available memory to a system? As I'm guessing this change will stop the test running on your mac locally as well

Yes, I could check for available memory. But I don't know exactly what the necessary memory is. And as such, I didn't want to put in a number that I didn't trust.

With that said, because the method checks the RUNNER_OS environmental variable (rather than just the OS) and that isn't (normally) set on a regular system (it certainly isn't on mine), it'll allow the running of the tests locally.

alhendrickson · 2025-10-16T11:57:59Z

medcat-v2/tests/utils/ner/test_deid.py

 cnf.general.nlp.provider = 'spacy'


+def should_do_test_ci() -> bool:


Very minor - would rename this func to "is_mac_os" or something more direct, so it documents itself a bit more, eg @unittest.skipIf(not is_mac_os()) is clear

I think that makes sense. Though notably this does check both that it is MacOS AND that it's on CI, because otherwise the environmental variable wouldn't be set.

…in CI

mart-r added 3 commits October 14, 2025 11:03

CU-869aujr7h: Add nightly workflow to check library stability

75e1775

CU-869aujr7h: Update working directory in new workflow

68deffd

CU-869aujr7h: Update comment in new workflow

63e7eea

CU-869aujr7h: Disallow incompatible transformers version

aaf9906

github-advanced-security bot found potential problems Oct 14, 2025

View reviewed changes

.github/workflows/medcat-v2-lib-stabiliy.yml Fixed Show fixed Hide fixed

mart-r added 10 commits October 14, 2025 11:21

CU-869aujr7h: Fix worklflow install / sync

86698af

CU-869aujr7h: Make worklflow only have read permissions

b3b955a

CU-869aujr7h: Install without lock

99042b0

CU-869aujr7h: Use non-uv pip for lock-free install

9ad4a9e

CU-869aujr7h: Force usage of correct python version in workflow

07d072e

CU-869aujr7h: Fix versions in workflow (3.10 instead of 3.1)

606769d

Typing fix for regression utils

ecc18de

Typing fix for modern bert RelCAT

ad6eb74

Merge branch 'main' into build/medcat/CU-869aujr7h-add-library-stabil…

0138827

…ity-workflow

CU-869aujr7h: Change the way tests timeout is set up

3dd38f4

alhendrickson approved these changes Oct 14, 2025

View reviewed changes

mart-r added 13 commits October 14, 2025 13:44

CU-869aujr7h: Attempt to fix builds on Windows by ignoring Windows + …

4cc196f

…python 3.9

Merge branch 'main' into build/medcat/CU-869aujr7h-add-library-stabil…

2f5beb7

…ity-workflow

CU-869aujr7h: Remove python 3.9 from matrix

aeabbaa

CU-869aujr7h: Attempt fix mock for Windows

112d3f9

CU-869aujr7h: Use CPU-only torch for MacOS in workflow to avoid MPS i…

53eee06

…ssues

CU-869aujr7h: Force installation to happen through bash so IF works o…

7b84d9a

…n Windows

Merge branch 'main' into build/medcat/CU-869aujr7h-add-library-stabil…

ef5c3e5

…ity-workflow

CU-869aujr7h: Add 3.13 for lib stability workflow

8817043

CU-869aujr7h: Allow 45 minutes for tests so tests on MacOS don't time…

ea89d14

… out

CU-869aujr7h: Use temporary directory instead of named temp file for …

2347a6c

…Windows compatibility

CU-869aujr7h: Avoid heavy RAM tests (DeID) on MacOS during CI

2dbce7f

CU-869aujr7h: Ignore further tests for MacOS runner

3976a51

mart-r added 5 commits October 15, 2025 16:49

CU-869aujr7h: Make component tests more flexible

663a20c

CU-869aujr7h: Fix test skip method call

ff56c75

Revert "CU-869aujr7h: [NEEDS TO BE REVERTED] Only run on MacOS and Wi…

242a02f

…ndows on 3.10." This reverts commit f201270.

CU-869aujr7h: Remove push-specific workflow triggers

c337428

CU-869avau57: Require numpy 2.1 or above for python 3.13

62da42b

alhendrickson approved these changes Oct 16, 2025

View reviewed changes

CU-869aujr7h: Rename helper method to avoid heavy RAM tests on MacOS …

ed8426e

…in CI

mart-r merged commit a5b2bfa into main Oct 16, 2025
32 checks passed

mart-r deleted the build/medcat/CU-869aujr7h-add-library-stability-workflow branch October 16, 2025 13:00



		@unittest.skipIf(not should_do_test_ci(),
		"MacOS on workflow doesn't have enough memory")

		cnf.general.nlp.provider = 'spacy'


		def should_do_test_ci() -> bool:

build(medcat): CU-869aujr7h Add nightly workflow to check library stability #171

build(medcat): CU-869aujr7h Add nightly workflow to check library stability #171

Uh oh!

Conversation

mart-r commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomolopolis commented Oct 14, 2025

Uh oh!

Uh oh!

alhendrickson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alhendrickson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mart-r commented Oct 14, 2025 •

edited

Loading