Skip to content

Conversation

mart-r
Copy link
Collaborator

@mart-r mart-r commented Oct 14, 2025

Currently, most of our workflows run based on the dependencies defined in uv.lock. This is great for reproducibility, stabiliy, as well as speed of installation. However, it means that we're not checking the library against any of the new versions of our dependencies that come up over time. What this means is that even if our workflows run successfully, we don't have any confidence that a user installing the project will be able to actually use the project with the installed dependencies.

So this PR introduces a new - nightly - workflow that ignores the lock file and installs the latest available dependencies and runs some typical tests against those. The idea is to use this to catch any dependencies that have updates and become uncomatible for any reason.

As an example, transformers==4.57.0 was released on 3rd of October. And it - inadvertently - dropped support for python 3.9. And because our workflows were not testing against this version, they were all passing. Yet a user installing after that date would have installed the version of transformers and found them to be incompatible only at run time.

The idea is that as a result of this PR failing, we can take action to identify the incompatible dependency and either patch the compatibility issue or release a patch version with an explicitly disallowed incompatible version.

The reason we don't always (i.e in all worklows) test against the latest dependencies is so that we can separate the compatibility of a change from a PR (what existing workflows handle) from the stability of the overall library (what this new workflow handled). After all, we don't want to force a contributer to make drastic changes to dependencies and / or patch incompatibility issues unrelated to their change just to make the workflows run.

What the workflow does overall:

  • Runs every night at 3am
  • Runs over all our supported python versions (3.9, 3.10, 3.11, 3.12, 3.13)
  • Runs over a number of different OS targets (ubuntu, macOs, Windows)
  • Runs linting and typing
  • Runs tests
  • Runs regression test suite

There is a small caveat for MacOS runner:

  • It ignores some DeID tests
    • The MacOS runner only has 7GB of RAM, and these would normally tip it over the edge
    • So currently they're just ignored on that (and only that) platform and only in CI
    • There's an extra 8 tests (i.e parts of a TestCase) that are ignored on MacOS

What it omits from the regular workflow is the backwards compatibility check. Though we may want to include that as well. I just felt like not doing all of it since this is going to be running on 12 different runners already (4 versions and 3 OSs).

@tomolopolis
Copy link
Member

Copy link
Collaborator

@alhendrickson alhendrickson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm


- name: Install with latest deps
run: |
uv run --python ${{ matrix.python-version }} python -m ensurepip
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey as wildcard one for another time (if ever)

This one reads the pyproject file and the python in this folder right? So it kind of says "the github project is still working based on what is in main"

Are we able to go a layer higher and do something extra to say "the pypi library is working". Or in other words really test for "Can users probably install and use Medcat today?"

uv pip install medcat # So actually bring in the latest pypi version
uv run some-easy-test

With some super easy test like

x = CAT.load_model_pack(" ")
result = x.get_entities("test")
assert whatever

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this reads the dependencies for pyproject.toml and installs stuff based on that.

We could add something to test PyPI based install as well. But I'm not sure it adds a lot. It sounds like testing PyPI's infrastucture and/or wheel mechanics at that point. And I feel like that's someone else's responsibility.

Perhaps a better option for this would be an isolated test in the production / release workflow? Something tests that installation of the wheel does in fact work as expected.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm more coming from the perspective of what is actually being verified:

Right now there's nothing that checks for the case of "I followed your readme instructions today and it worked". This test as is here starts with "I checked out your main branch... and ran unit tests" which isn't quite the same thing

Interestingly, back when it was pinned versions it did actually test for it in a way - at least the tutorial/service tests verified it I think. Maybe this change is really where the gap was introduced, and the uv.lock part has confused it.

As a suggestion - if alongside your test here we also made tutorials run nightly but from the pinned version, does it totally solve for the uv.lock concern, as well as be the full user facing test?

Copy link
Collaborator

@alhendrickson alhendrickson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, would be good to see what happens at 3am tonight


- name: Install with latest deps
run: |
uv run --python ${{ matrix.python-version }} python -m ensurepip
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm more coming from the perspective of what is actually being verified:

Right now there's nothing that checks for the case of "I followed your readme instructions today and it worked". This test as is here starts with "I checked out your main branch... and ran unit tests" which isn't quite the same thing

Interestingly, back when it was pinned versions it did actually test for it in a way - at least the tutorial/service tests verified it I think. Maybe this change is really where the gap was introduced, and the uv.lock part has confused it.

As a suggestion - if alongside your test here we also made tutorials run nightly but from the pinned version, does it totally solve for the uv.lock concern, as well as be the full user facing test?

contents: read
on:
schedule:
- cron: "0 3 * * *" # every day at 3am UTC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to try this for now at 3am to see what it is like. Personally might suggest doing it during working hours? Or just before 9am anyway. Trade off between when having the email come in is useful, vs slowing down other builds...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I wanted it overnight is so that it doesn't disturb other work. This spawns 4x3=12 workflows, all of which take quite a while (20-30 minutes). And if this were to happen during work hours, it migth cause workflows for active work to be queued. So I figured I'd run it at night - when no other work is happening - and deal with it in the morning if I have to.

strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a design point of view:

I'm thinking in general you can make this as close to the _main one as possible? Keeping it simple and easy to maintain/

(Great to try all the other OS's just in this not main though - so this is one diff that is good)

But taking this example for the Test step, can we make them completely identical? Either change main or this one - as I dont want to have to think about if timeout-minutes is better than timeout (as an example of keeping it simple)

In here:

      - name: Test
        run: |
          uv run --python ${{ matrix.python-version }} python -m unittest discover
        timeout-minutes: 45

Vs in main

      - name: Test
        run: |
          timeout 30m uv run python -m unittest discover

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason the specific example was changed was because that didn't work for the Windows runner since it didn't have the timeout command available. And I didn't think it was within the scope here to change the main workflow to be in line with this change.

From a brief look we should be able to fully modularise this. I.e have the normal workflow steps (linting, typing, tests) in a separate workflow file (that never runs on its own) and we use it for both the main workflow and this one, but with slightly different inputs. I think it's probably worth doing. But I'd leave it for another task.



@unittest.skipIf(not should_do_test_ci(),
"MacOS on workflow doesn't have enough memory")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So minor - could you alternatively find if there is enough available memory to a system? As I'm guessing this change will stop the test running on your mac locally as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I could check for available memory. But I don't know exactly what the necessary memory is. And as such, I didn't want to put in a number that I didn't trust.

With that said, because the method checks the RUNNER_OS environmental variable (rather than just the OS) and that isn't (normally) set on a regular system (it certainly isn't on mine), it'll allow the running of the tests locally.

cnf.general.nlp.provider = 'spacy'


def should_do_test_ci() -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor - would rename this func to "is_mac_os" or something more direct, so it documents itself a bit more, eg @unittest.skipIf(not is_mac_os()) is clear

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense. Though notably this does check both that it is MacOS AND that it's on CI, because otherwise the environmental variable wouldn't be set.

@mart-r mart-r merged commit a5b2bfa into main Oct 16, 2025
32 checks passed
@mart-r mart-r deleted the build/medcat/CU-869aujr7h-add-library-stability-workflow branch October 16, 2025 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants