Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✅ Reuse models and datasets in tests #641

Merged
merged 5 commits into from
Jul 19, 2023
Merged

Conversation

blaginin
Copy link
Collaborator

@blaginin blaginin commented Jul 17, 2023

This PR addresses part of issue #603 by improving model weights and datasets in tests. Previously, weights for each model were downloaded to f"{tmp_path}/weights.pth" and replaced each other. The only available dataset, Kather, was downloaded into TIATOOLBOX_HOME, but the entire home directory was deleted during test execution. This approach has several drawbacks:

  • Tests are extremely slow. On every test run, it downloads ~8 GB of weight files which never change.
  • Tests cannot be run in parallel as they depend on the same f"{tmp_path}/weights.pth" file.
  • Tests heavily load TIA's infrastructure. For each test run, the server has to transfer almost 10 GB of static assets. This might work while the number of contributors is small, but what happens when it increases?
  • Tests were inconsistent with their models caching. Why, for example, PatchPredictor cached their pretrained_models, but UNetModel did not?

Now, all weights are downloaded to TIATOOLBOX_HOME and cached between runs. Storing more data in memory (+8 GB) makes tests much faster, especially if you're not in Coventry or have slow internet. In rare cases where you specifically need to test behaviour without cache, you can set TIATOOLBOX_HOME to a temporary directory.

# Conflicts:
#	tests/models/test_arch_mapde.py
#	tests/models/test_arch_micronet.py
#	tests/models/test_patch_predictor.py
#	tiatoolbox/models/architecture/__init__.py
@blaginin blaginin self-assigned this Jul 17, 2023
@codecov
Copy link

codecov bot commented Jul 17, 2023

Codecov Report

Merging #641 (4297c99) into develop (e8b3d91) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 4297c99 differs from pull request most recent head 096bc31. Consider uploading reports for the commit 096bc31 to get more accurate results

@@           Coverage Diff            @@
##           develop     #641   +/-   ##
========================================
  Coverage    99.78%   99.78%           
========================================
  Files           65       65           
  Lines         7072     7076    +4     
  Branches      1395     1397    +2     
========================================
+ Hits          7057     7061    +4     
  Misses           7        7           
  Partials         8        8           
Impacted Files Coverage Δ
tiatoolbox/models/dataset/info.py 100.00% <ø> (ø)
tiatoolbox/models/architecture/__init__.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@blaginin blaginin marked this pull request as ready for review July 18, 2023 10:26
@blaginin blaginin requested a review from shaneahmed July 18, 2023 10:45
@blaginin
Copy link
Collaborator Author

@shaneahmed, I reviewed the PR with Ruff and fixed the error. Regarding the execution time, the first time you run the code, it doesn't have any effect as it will be downloading files into the cache. However, the second time, you will see a speedup.

On my local machine with high-speed internet, the difference is 13.75 minutes vs 10.89 minutes. However, if I use 4G, the difference becomes more noticeable: 1 hour 10 minutes vs 36 minutes.

@shaneahmed shaneahmed added the dev tools Changes/Updates in Development tools label Jul 19, 2023
Copy link
Member

@shaneahmed shaneahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @blaginin

@shaneahmed shaneahmed merged commit 3614f61 into develop Jul 19, 2023
@shaneahmed shaneahmed deleted the tests-model-weights branch July 19, 2023 10:57
@shaneahmed shaneahmed added this to the Release v1.5.0 milestone Jul 19, 2023
@blaginin blaginin mentioned this pull request Jul 31, 2023
shaneahmed pushed a commit that referenced this pull request Aug 23, 2023
Adds the ability to run tests using several workers using [pytest-xdist](https://github.com/pytest-dev/pytest-xdist), significantly improving processing time. For example, on M1 Max (no CUDA), processing time dropped **from 14 minutes to 4 minutes 💨💨💨.**

<img width="1614" alt="image" src="https://github.com/TissueImageAnalytics/tiatoolbox/assets/19199204/fbb607b0-3bf1-48c3-b14a-be4acf2b1ec3">


However, this optimization comes at a cost. Previously, tests depended on serial execution. For example, segmentation and prediction methods used to rely on "output" as a folder to store intermediate results. If many functions modified this folder at the same time, the result would be unpredictable. To address this, I made some tweaks alongside #641 and #673 so that functions will not depend on each other.

If we merge this pull request, we will need to start checking that new tests are ready for parallel execution.

**Depends on #641 and #673
@shaneahmed shaneahmed mentioned this pull request Dec 15, 2023
shaneahmed added a commit that referenced this pull request Dec 15, 2023
## 1.5.0 (2023-12-15)

### Major Updates and Feature Improvements

- Adds the bokeh visualization tool. #684
  - The tool allows a user to launch a server on their machine to visualise whole slide images, overlay the results of deep learning algorithms or to select a patch from whole slide image and run TIAToolbox deep learning engines.
  - This tool powers the TIA demos server. For details please see https://tiademos.dcs.warwick.ac.uk/.
- Extends Annotation to Support Init from WKB #639
- Adds `IOConfig` for NuClick in `pretrained_model.yaml` #709
- Adds functions to save the TIAToolbox Engine outputs to Zarr and AnnotationStore files. #724
- Adds Support for QuPath Annotation Imports #721

### Changes to API

- Adds `model.to(device)` and `model.load_model_from_file()` functionality to make it compatible with PyTorch API. #733
- Replaces `pretrained` with `weights` to make the engines compatible with the new PyTorch API. #621
- Adds support for high-level imports for various utility functions and classes such as `WSIReader`, `PatchPredictor` and `imread` #606, #607,
- Adds `tiatoolbox.typing` for type hints. #619
- Fixes incorrect file size saved by `save_tiles`, issue with certain WSIs raised by @TomastpPereira
- TissueMasker transform now returns mask instead of a list. #748
  - Fixes #732

### Bug Fixes and Other Changes

- Fixes `pixman` incompability error on Colab #601
- Removes `shapely.speedups`. The module no longer has any affect in Shapely >=2.0. #622
- Fixes errors in the slidegraph example notebook #608
- Fixes bugs in WSI Registration #645, #670, #693
- Fixes the situation where PatchExtractor.get_coords() can return patch coords which lie fully outside the bounds of a slide. #712
  - Fixes #710
- Fixes #738 raised by @xiachenrui

### Development related changes

- Replaces `flake8` and `isort` with `ruff` #625, #666
- Adds `mypy` checks to `root` and `utils` package. This will be rolled out in phases to other modules. #723
- Adds a module to detect file types using magic number/signatures #616
- Uses `poetry` for version updates instead of `bump2version`. #638
- Removes `setup.cfg` and uses `pyproject.toml` for project configurations.
- Reduces runtime for some unit tests e.g., #627, #630, #631, #629
- Reuses models and datasets in tests on GitHub actions by utilising cache #641, #644
- Set up parallel tests locally #671

**Full Changelog:** v1.4.0...v1.5.0

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: mostafajahanifar <74412979+mostafajahanifar@users.noreply.github.com>
Co-authored-by: John Pocock <John-P@users.noreply.github.com>
Co-authored-by: DavidBAEpstein <David.Epstein@warwick.ac.uk>
Co-authored-by: David Epstein <22086916+DavidBAEpstein@users.noreply.github.com>
Co-authored-by: Ruqayya Awan <18444369+ruqayya@users.noreply.github.com>
Co-authored-by: Mark Eastwood <20169086+measty@users.noreply.github.com>
Co-authored-by: adamshephard <39619155+adamshephard@users.noreply.github.com>
Co-authored-by: adamshephard <adam.shephard@warwick.ac.uk>
Co-authored-by: Abdol <a@fkrtech.com>
Co-authored-by: Jiaqi-Lv <60471431+Jiaqi-Lv@users.noreply.github.com>
Co-authored-by: Abishek <abishekraj6797@gmail.com>
Co-authored-by: Dmitrii Blaginin <blaginin@mbp.lan>
@shaneahmed shaneahmed mentioned this pull request Dec 15, 2023
shaneahmed added a commit that referenced this pull request Dec 15, 2023
## 1.5.0 (2023-12-15)

### Major Updates and Feature Improvements

- Adds the bokeh visualization tool. #684
  - The tool allows a user to launch a server on their machine to visualise whole slide images, overlay the results of deep learning algorithms or to select a patch from whole slide image and run TIAToolbox deep learning engines.
  - This tool powers the TIA demos server. For details please see https://tiademos.dcs.warwick.ac.uk/.
- Extends Annotation to Support Init from WKB #639
- Adds `IOConfig` for NuClick in `pretrained_model.yaml` #709
- Adds functions to save the TIAToolbox Engine outputs to Zarr and AnnotationStore files. #724
- Adds Support for QuPath Annotation Imports #721

### Changes to API

- Adds `model.to(device)` and `model.load_model_from_file()` functionality to make it compatible with PyTorch API. #733
- Replaces `pretrained` with `weights` to make the engines compatible with the new PyTorch API. #621
- Adds support for high-level imports for various utility functions and classes such as `WSIReader`, `PatchPredictor` and `imread` #606, #607,
- Adds `tiatoolbox.typing` for type hints. #619
- Fixes incorrect file size saved by `save_tiles`, issue with certain WSIs raised by @TomastpPereira
- TissueMasker transform now returns mask instead of a list. #748
  - Fixes #732

### Bug Fixes and Other Changes

- Fixes `pixman` incompability error on Colab #601
- Removes `shapely.speedups`. The module no longer has any affect in Shapely >=2.0. #622
- Fixes errors in the slidegraph example notebook #608
- Fixes bugs in WSI Registration #645, #670, #693
- Fixes the situation where PatchExtractor.get_coords() can return patch coords which lie fully outside the bounds of a slide. #712
  - Fixes #710
- Fixes #738 raised by @xiachenrui

### Development related changes

- Replaces `flake8` and `isort` with `ruff` #625, #666
- Adds `mypy` checks to `root` and `utils` package. This will be rolled out in phases to other modules. #723
- Adds a module to detect file types using magic number/signatures #616
- Uses `poetry` for version updates instead of `bump2version`. #638
- Removes `setup.cfg` and uses `pyproject.toml` for project configurations.
- Reduces runtime for some unit tests e.g., #627, #630, #631, #629
- Reuses models and datasets in tests on GitHub actions by utilising cache #641, #644
- Set up parallel tests locally #671

**Full Changelog:** v1.4.0...v1.5.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev tools Changes/Updates in Development tools
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants