-
Notifications
You must be signed in to change notification settings - Fork 2
build(medcat): CU-869aujr7h Add nightly workflow to check library stability #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build(medcat): CU-869aujr7h Add nightly workflow to check library stability #171
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
||
- name: Install with latest deps | ||
run: | | ||
uv run --python ${{ matrix.python-version }} python -m ensurepip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey as wildcard one for another time (if ever)
This one reads the pyproject file and the python in this folder right? So it kind of says "the github project is still working based on what is in main"
Are we able to go a layer higher and do something extra to say "the pypi library is working". Or in other words really test for "Can users probably install and use Medcat today?"
uv pip install medcat # So actually bring in the latest pypi version
uv run some-easy-test
With some super easy test like
x = CAT.load_model_pack(" ")
result = x.get_entities("test")
assert whatever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this reads the dependencies for pyproject.toml
and installs stuff based on that.
We could add something to test PyPI based install as well. But I'm not sure it adds a lot. It sounds like testing PyPI's infrastucture and/or wheel mechanics at that point. And I feel like that's someone else's responsibility.
Perhaps a better option for this would be an isolated test in the production / release workflow? Something tests that installation of the wheel does in fact work as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm more coming from the perspective of what is actually being verified:
Right now there's nothing that checks for the case of "I followed your readme instructions today and it worked". This test as is here starts with "I checked out your main branch... and ran unit tests" which isn't quite the same thing
Interestingly, back when it was pinned versions it did actually test for it in a way - at least the tutorial/service tests verified it I think. Maybe this change is really where the gap was introduced, and the uv.lock part has confused it.
As a suggestion - if alongside your test here we also made tutorials run nightly but from the pinned version, does it totally solve for the uv.lock concern, as well as be the full user facing test?
… 3.10. This commit TEMPORARILY (while the workflows are failing) makes them only run on Windows and MacOS (which are the workflows that are failing) and on python 3.10 so as to lower the overall number of workflow runners.
…Windows compatibility
…ndows on 3.10." This reverts commit f201270.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, would be good to see what happens at 3am tonight
|
||
- name: Install with latest deps | ||
run: | | ||
uv run --python ${{ matrix.python-version }} python -m ensurepip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm more coming from the perspective of what is actually being verified:
Right now there's nothing that checks for the case of "I followed your readme instructions today and it worked". This test as is here starts with "I checked out your main branch... and ran unit tests" which isn't quite the same thing
Interestingly, back when it was pinned versions it did actually test for it in a way - at least the tutorial/service tests verified it I think. Maybe this change is really where the gap was introduced, and the uv.lock part has confused it.
As a suggestion - if alongside your test here we also made tutorials run nightly but from the pinned version, does it totally solve for the uv.lock concern, as well as be the full user facing test?
contents: read | ||
on: | ||
schedule: | ||
- cron: "0 3 * * *" # every day at 3am UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to try this for now at 3am to see what it is like. Personally might suggest doing it during working hours? Or just before 9am anyway. Trade off between when having the email come in is useful, vs slowing down other builds...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I wanted it overnight is so that it doesn't disturb other work. This spawns 4x3=12 workflows, all of which take quite a while (20-30 minutes). And if this were to happen during work hours, it migth cause workflows for active work to be queued. So I figured I'd run it at night - when no other work is happening - and deal with it in the morning if I have to.
strategy: | ||
fail-fast: false | ||
matrix: | ||
os: [ubuntu-latest, macos-latest, windows-latest] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a design point of view:
I'm thinking in general you can make this as close to the _main one as possible? Keeping it simple and easy to maintain/
(Great to try all the other OS's just in this not main though - so this is one diff that is good)
But taking this example for the Test step, can we make them completely identical? Either change main or this one - as I dont want to have to think about if timeout-minutes is better than timeout (as an example of keeping it simple)
In here:
- name: Test
run: |
uv run --python ${{ matrix.python-version }} python -m unittest discover
timeout-minutes: 45
Vs in main
- name: Test
run: |
timeout 30m uv run python -m unittest discover
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason the specific example was changed was because that didn't work for the Windows runner since it didn't have the timeout
command available. And I didn't think it was within the scope here to change the main workflow to be in line with this change.
From a brief look we should be able to fully modularise this. I.e have the normal workflow steps (linting, typing, tests) in a separate workflow file (that never runs on its own) and we use it for both the main workflow and this one, but with slightly different inputs. I think it's probably worth doing. But I'd leave it for another task.
|
||
|
||
@unittest.skipIf(not should_do_test_ci(), | ||
"MacOS on workflow doesn't have enough memory") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So minor - could you alternatively find if there is enough available memory to a system? As I'm guessing this change will stop the test running on your mac locally as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I could check for available memory. But I don't know exactly what the necessary memory is. And as such, I didn't want to put in a number that I didn't trust.
With that said, because the method checks the RUNNER_OS
environmental variable (rather than just the OS) and that isn't (normally) set on a regular system (it certainly isn't on mine), it'll allow the running of the tests locally.
cnf.general.nlp.provider = 'spacy' | ||
|
||
|
||
def should_do_test_ci() -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor - would rename this func to "is_mac_os" or something more direct, so it documents itself a bit more, eg @unittest.skipIf(not is_mac_os())
is clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense. Though notably this does check both that it is MacOS AND that it's on CI, because otherwise the environmental variable wouldn't be set.
Currently, most of our workflows run based on the dependencies defined in
uv.lock
. This is great for reproducibility, stabiliy, as well as speed of installation. However, it means that we're not checking the library against any of the new versions of our dependencies that come up over time. What this means is that even if our workflows run successfully, we don't have any confidence that a user installing the project will be able to actually use the project with the installed dependencies.So this PR introduces a new - nightly - workflow that ignores the lock file and installs the latest available dependencies and runs some typical tests against those. The idea is to use this to catch any dependencies that have updates and become uncomatible for any reason.
As an example,
transformers==4.57.0
was released on 3rd of October. And it - inadvertently - dropped support for python 3.9. And because our workflows were not testing against this version, they were all passing. Yet a user installing after that date would have installed the version oftransformers
and found them to be incompatible only at run time.The idea is that as a result of this PR failing, we can take action to identify the incompatible dependency and either patch the compatibility issue or release a patch version with an explicitly disallowed incompatible version.
The reason we don't always (i.e in all worklows) test against the latest dependencies is so that we can separate the compatibility of a change from a PR (what existing workflows handle) from the stability of the overall library (what this new workflow handled). After all, we don't want to force a contributer to make drastic changes to dependencies and / or patch incompatibility issues unrelated to their change just to make the workflows run.
What the workflow does overall:
3.9, 3.10, 3.11, 3.12, 3.13)There is a small caveat for MacOS runner:
TestCase
) that are ignored on MacOSWhat it omits from the regular workflow is the backwards compatibility check. Though we may want to include that as well. I just felt like not doing all of it since this is going to be running on 12 different runners already (4 versions and 3 OSs).