Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up interpolation in ensemble_copula_coupling.ConvertProbabilitiesToPercentiles #1578

Merged
merged 46 commits into from
Nov 16, 2021

Conversation

btrotta-bom
Copy link
Contributor

@btrotta-bom btrotta-bom commented Oct 7, 2021

Speeds up ConvertProbabilitiesToPercentiles by around 1.7x when tested on real data. Also can use multithreading for further speedups.

Testing:

  • [ X] Ran tests and they passed OK
  • [ X] Added new tests for the new feature(s)

@codecov
Copy link

codecov bot commented Oct 7, 2021

Codecov Report

Merging #1578 (2e56664) into master (f05cc28) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1578   +/-   ##
=======================================
  Coverage   98.07%   98.07%           
=======================================
  Files         110      110           
  Lines       10010    10023   +13     
=======================================
+ Hits         9817     9830   +13     
  Misses        193      193           
Impacted Files Coverage Δ
...semble_copula_coupling/ensemble_copula_coupling.py 99.04% <100.00%> (-0.01%) ⬇️
improver/ensemble_copula_coupling/utilities.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f05cc28...2e56664. Read the comment docs.

@btrotta-bom btrotta-bom changed the title [WIP] Speed up interpolation in ConvertProbabilitiesToPercentiles Speed up interpolation in ConvertProbabilitiesToPercentiles Oct 7, 2021
@btrotta-bom btrotta-bom added the MO review required PRs opened by non-Met Office developers that require a Met Office review label Oct 7, 2021
@zfan001 zfan001 self-requested a review October 12, 2021 04:11
@btrotta-bom btrotta-bom changed the title Speed up interpolation in ConvertProbabilitiesToPercentiles Speed up interpolation in ensemble_copula_coupling.ConvertProbabilitiesToPercentiles Oct 13, 2021
break
max_ind = xp.shape[1]
min_val = fp[0]
max_val = fp[max_ind - 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the shapes of xp and fp are indeed compatible then you should just be able to refer to the final index.
max_val = fp[max_ind - 1] -> max_val = fp[-1]

As it stands, fp size can be longer than xp.shape[1] and your function will silently continue without trouble.
See above comment on sanity checking input arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

numba_installed = False


class TestInterpolateMultipleRows(IrisTest):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this would be better named Test_interpolate_multiple_rows_same_y?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding a test which mocks both fast_interp_same_y and slow_interp_same_y to check which implementation is actually used when numba is available and which one when not (i.e. when called via interpolate_multiple_rows_same_y). Happy to provide some pointers if required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made an attempt at this, but I'm not very familiar with mocking, so suggestions for improvement are welcome.

@bayliffe
Copy link
Contributor

bayliffe commented Nov 9, 2021

@btrotta-bom I took your timing example from the ResamplePercentiles PR and used it to test this PR. I found the numba method to be around 25 times faster than the slow method, which is a significant speed up, making this very worthwhile.

max_ind = xp.shape[1]
min_val = fp[0]
max_val = fp[-1]
result = np.empty((xp.shape[0], len(x)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has lost the casting to float32. Otherwise I'm happy with these changes. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

bayliffe
bayliffe previously approved these changes Nov 12, 2021
Copy link
Contributor

@cpelley cpelley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @btrotta-bom, thank you for the changes you made.
I was going make suggested changes in this PR but I thought it easier for us both if I make a PR with those changes targeted towards your branch instead.

btrotta-bom#1

My PR:

  • Creates a public function interpolate_multiple_rows_same_y, documenting the conditions of change of behaviour between numba vs no numba availability.
  • mock is part of unittest in recent Python, not an independent library. So corrected this.
  • Added an alternative (I think simpler) approach to testing what implementation is used when numba is available or not (in the testing).

@cpelley cpelley self-requested a review November 16, 2021 11:05
Copy link
Contributor

@cpelley cpelley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your for the contribution @btrotta-bom and indeed your patience! :)

Cheers


interpolate_multiple_rows_same_x is already merged with master so you needn't worry about this now, butit might be worth adding test_slow_interp_same_x_called and test_fast_interp_same_x_called test methods similar to what was done to the testing of interpolate_multiple_rows_same_y (ensuring which implementation is actually used with and without numba).

@btrotta-bom
Copy link
Contributor Author

Thanks @cpelley and @bayliffe !

it might be worth adding test_slow_interp_same_x_called and test_fast_interp_same_x_called test methods similar to what was done to the testing of interpolate_multiple_rows_same_y (ensuring which implementation is actually used with and without numba).

I will add an issue for this

@btrotta-bom btrotta-bom merged commit f6844af into metoppv:master Nov 16, 2021
MoseleyS added a commit that referenced this pull request Jan 18, 2022
* master:
  Remove __repr__ methods from all neighbourhood plugins (#1648)
  ENH: Avoiding lazy loading in select command calls (#1617)
  MOBT-180: Weather symbol speed up (#1638)
  IM-1621: Make ECC error and warning tests more rigorous (#1641)
  Make flake8 report that it is okay when running improver-tests. (#1645)
  Update checksums after updating the title of files in apply-emos-coefficients/sites. (#1640)
  Fixes bug in spot-extraction for multi-time inputs (#1633)
  Updates checksums for threshold landmask fix (#1636)
  Update interpret-metadata (#1632)
  Weather code tree update (#1635)
  Fix noise in precip accumulation thresholds (#1627)
  Expanding on triangle time blending doc strings. (#1630)
  Better handling and documentation of dependencies (#1589)
  Add tests (#1626)
  Enhancements on new regridding code (#1560)
  Speed up interpolation in ensemble_copula_coupling.ConvertProbabilitiesToPercentiles (#1578)
  Speed up interpolation in ensemble_copula_coupling.ResamplePercentiles (#1548)
  Spot-extraction additional coordinates ordering fix (#1610)
MoseleyS added a commit that referenced this pull request Jan 18, 2022
* upstream/master:
  Remove __repr__ methods from all neighbourhood plugins (#1648)
  ENH: Avoiding lazy loading in select command calls (#1617)
  MOBT-180: Weather symbol speed up (#1638)
  IM-1621: Make ECC error and warning tests more rigorous (#1641)
  Make flake8 report that it is okay when running improver-tests. (#1645)
  Update checksums after updating the title of files in apply-emos-coefficients/sites. (#1640)
  Fixes bug in spot-extraction for multi-time inputs (#1633)
  Updates checksums for threshold landmask fix (#1636)
  Update interpret-metadata (#1632)
  Weather code tree update (#1635)
  Fix noise in precip accumulation thresholds (#1627)
  Expanding on triangle time blending doc strings. (#1630)
  Better handling and documentation of dependencies (#1589)
  Add tests (#1626)
  Enhancements on new regridding code (#1560)
  Speed up interpolation in ensemble_copula_coupling.ConvertProbabilitiesToPercentiles (#1578)
  Speed up interpolation in ensemble_copula_coupling.ResamplePercentiles (#1548)
  Spot-extraction additional coordinates ordering fix (#1610)
MoseleyS pushed a commit to MoseleyS/improver that referenced this pull request Aug 22, 2024
…esToPercentiles (metoppv#1578)

* Add fast interpolation method

* Style

* Add missing file

* Isort

* Style

* Style

* Add licence and docstring

* Mock numba in sphinx autodoc

* Tell codecov to ignore numba_utilities.py

* Add more tests

* Tell black to ignore import in test

* isort

* Add noqa for spurious black result

* Remove unused code

* Fix bug

* Remove warning test

* Remove unused code

* Remove unused import

* Fix imports

* Simplify code

* Add a test

* Change method names to distinguish from functionality for other open PR

* Update type hints

* Add test

* Style

* Calculate in 64-bit, output in 32-bit

* Sort x

* Add simple tests with known result

* Style

* Restore old code for handling unordered x

* Add input checking

* Test correct version is used depending on numba

* Style

* Fix imports

* Style

* Fix mocking

* Cast result to float32

* MAINT: Proposed review suggestions

* Fix merge

* Fix interpolate_multiple_rows_same_x to use same approach as interpolate_multiple_rows_same_y

* Style

* Isort

* Fix comment

Co-authored-by: Belinda Trotta <btrotta-bom@users.noreply.github.com>
Co-authored-by: cpelley <carwyn.pelley@metoffice.gov.uk>
MoseleyS added a commit to MoseleyS/improver that referenced this pull request Aug 22, 2024
* upstream/master:
  Remove __repr__ methods from all neighbourhood plugins (metoppv#1648)
  ENH: Avoiding lazy loading in select command calls (metoppv#1617)
  MOBT-180: Weather symbol speed up (metoppv#1638)
  IM-1621: Make ECC error and warning tests more rigorous (metoppv#1641)
  Make flake8 report that it is okay when running improver-tests. (metoppv#1645)
  Update checksums after updating the title of files in apply-emos-coefficients/sites. (metoppv#1640)
  Fixes bug in spot-extraction for multi-time inputs (metoppv#1633)
  Updates checksums for threshold landmask fix (metoppv#1636)
  Update interpret-metadata (metoppv#1632)
  Weather code tree update (metoppv#1635)
  Fix noise in precip accumulation thresholds (metoppv#1627)
  Expanding on triangle time blending doc strings. (metoppv#1630)
  Better handling and documentation of dependencies (metoppv#1589)
  Add tests (metoppv#1626)
  Enhancements on new regridding code (metoppv#1560)
  Speed up interpolation in ensemble_copula_coupling.ConvertProbabilitiesToPercentiles (metoppv#1578)
  Speed up interpolation in ensemble_copula_coupling.ResamplePercentiles (metoppv#1548)
  Spot-extraction additional coordinates ordering fix (metoppv#1610)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MO review required PRs opened by non-Met Office developers that require a Met Office review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants