Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up histogram in plot options (alternative to #2763) #2771

Merged
merged 12 commits into from
May 8, 2024

Conversation

astrofrog
Copy link
Collaborator

@astrofrog astrofrog commented Mar 28, 2024

Description

Like #2763, this starts off by reverting #2735, and then makes use of:

to speed up the histogram calculation. I also removed:

        # filter out nans (or else bqplot will fail)
        if np.any(np.isnan(sub_data)):
            sub_data = sub_data[~np.isnan(sub_data)]

which was causing the whole array to get loaded into memory - this isn't needed anyway because glue's histogram viewer deals fine with NaN values. Finally, I also updated the calculation of the percentile limits to use glue, also with random sampling, otherwise PercentileLimits caused the whole array to be loaded into memory and caused slowdowns.

With these changes, I was able to load an 8Gb memory-mapped array and have the histogram update within a second or so. We might want to play with the random subset size a little to see what is a good compromise with real-world data.

I wonder whether it might make sense to consider having some kind of performance benchmark for this kind of functionality, because it's easy to accidentally introduce regressions that cause all the data to be used.

This requires the two glue PRs to be merged and released before merging this PR.

Change log entry

  • Is a change log needed? If yes, is it added to CHANGES.rst? If you want to avoid merge conflicts,
    list the proposed change log here for review and add to CHANGES.rst before merge. If no, maintainer
    should add a no-changelog-entry-needed label.

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

  • Are two approvals required? Branch protection rule does not check for the second approval. If a second approval is not necessary, please apply the trivial label.
  • Do the proposed changes actually accomplish desired goals? Also manually run the affected example notebooks, if necessary.
  • Do the proposed changes follow the STScI Style Guides?
  • Are tests added/updated as required? If so, do they follow the STScI Style Guides?
  • Are docs added/updated as required? If so, do they follow the STScI Style Guides?
  • Did the CI pass? If not, are the failures related?
  • Is a milestone set? Set this to bugfix milestone if this is a bug fix and needs to be released ASAP; otherwise, set this to the next major release milestone. Bugfix milestone also needs an accompanying backport label.
  • After merge, any internal documentations need updating (e.g., JIRA, Innerspace)? 🐱

@github-actions github-actions bot added the plugin Label for plugins common to multiple configurations label Mar 28, 2024
@astrofrog astrofrog force-pushed the speed-up-histogram-plot-options branch 2 times, most recently from 128f5b8 to 86ee25c Compare March 28, 2024 11:20
@pllim pllim added this to the 3.9 milestone Mar 29, 2024
@pllim pllim added performance Performance related skip-changelog-checks changelog bot directive Affects-dev changelog bot directive imviz labels Mar 29, 2024
@pllim
Copy link
Contributor

pllim commented Apr 1, 2024

Hmm. I installed those upstream branches you mentioned and installed Jdaviz from this PR branch. After I load that 8 GB image from Cami, I saw a non-sensible histogram for a second (just one bar) and then it self-refreshed and got stuck in the loading circle of death. All I did was load that image and then open Plot Options plugin, scroll down, and waited.

Screenshot 2024-04-01 182700

@pllim
Copy link
Contributor

pllim commented Apr 1, 2024

Also looks like this patch broke the spline thingy?

>           assert np.allclose(po._obj.stretch_params_value["knots"], knots_after_drag_move)
E           assert False
E            +  where False = <function allclose at 0x000001FC915F9170>(([0.0, 0.1, 0.21712585033417825, 0.7, 1.0], [0.0, 0.05, 0.3097925111911964, 0.9, 1.0]), ([0.0, 0.1, 0.21712585033417825, 0.7, 1.0], [0.0, 0.05, 0.2900993441358025, 0.9, 1.0]))
E            +    where <function allclose at 0x000001FC915F9170> = np.allclose

jdaviz\core\tests\test_tools.py:103: AssertionError

@astrofrog
Copy link
Collaborator Author

@pllim could you send me a link to the 8Gb file? Thanks!

@pllim
Copy link
Contributor

pllim commented Apr 3, 2024

@astrofrog , Cami has reached out to you separately about the file. FYI.

@astrofrog
Copy link
Collaborator Author

@pllim - could you try and run the following Python script using pyinstrument and let me know the output?

from jdaviz import Imviz

imviz = Imviz()
viewer = imviz.default_viewer

imviz.load_data("egs_all_acs_wfc_f814w_030mas_v1.9_drz.fits")
imviz.plugins["Plot Options"]._obj._update_stretch_histogram()

With this PR and the two glue ones, the total time for the script goes down from 40s to 10s and a lot of that is the loading of the FITS file which has 27634 entries in the header?

@pllim
Copy link
Contributor

pllim commented Apr 5, 2024

could you try and run the following Python script using pyinstrument and let me know the output?

I tried calling pyinstrument script.py but it is giving me unrelated errors like this from astropy.units:

RuntimeError: dictionary changed size during iteration

And after I hack-patched those on the installed astropy, it gave me this from a package I know nothing about:

toolz/curried/operator.py", line 15, in <module>
    {name: f if name in IGNORE else curry(f)
                                    ^^^^^^^^
  File "python3.12/site-packages/toolz/functoolz.py", line 201, in __init__
    raise TypeError("Input must be callable")
TypeError: Input must be callable

Am I doing it wrong? Please advise. Thanks!

@pllim
Copy link
Contributor

pllim commented Apr 5, 2024

FWIW, with this PR's current state, rebased on top of main, I still see the "loading circle of death".

@gibsongreen gibsongreen modified the milestones: 3.9, 3.10 Apr 5, 2024
@astrofrog
Copy link
Collaborator Author

@pllim huh very weird - could you let me know what you get for:

pyinstrument --version

Thanks!

@astrofrog astrofrog force-pushed the speed-up-histogram-plot-options branch 4 times, most recently from 9dadc8d to e7a7589 Compare April 9, 2024 11:08
pyproject.toml Outdated
@@ -125,7 +125,6 @@ text_file_format = "rst"
addopts = "--doctest-rst --import-mode=append"
xfail_strict = true
filterwarnings = [
"error",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be deleted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@bmorris3 bmorris3 modified the milestones: 3.10, 3.10.1 May 3, 2024
@astrofrog astrofrog force-pushed the speed-up-histogram-plot-options branch from 224aa4e to 6257bb8 Compare May 7, 2024 14:06
@astrofrog astrofrog force-pushed the speed-up-histogram-plot-options branch from 6257bb8 to 30a4707 Compare May 7, 2024 14:06
@astrofrog
Copy link
Collaborator Author

@pllim - ready for another review

@pllim pllim added 💤backport-v3.10.x on-merge: backport to v3.10.x and removed skip-changelog-checks changelog bot directive Affects-dev changelog bot directive labels May 7, 2024
Copy link
Contributor

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.

Copy link
Collaborator

@rosteen rosteen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM. Edit: Wow I didn't realize this was character-for-character exactly what @pllim said 😆

@pllim pllim merged commit c5778cd into spacetelescope:main May 8, 2024
21 of 22 checks passed
meeseeksmachine pushed a commit to meeseeksmachine/jdaviz that referenced this pull request May 8, 2024
@pllim pllim added this to the 3.11 milestone Jun 6, 2024
@pllim pllim removed the 💤backport-v3.10.x on-merge: backport to v3.10.x label Jun 6, 2024
duytnguyendtn pushed a commit to duytnguyendtn/jdaviz that referenced this pull request Jul 23, 2024
… (spacetelescope#2771)

* Revert "BUG: Always downsample histogram for Imviz in Plot Options (spacetelescope#2735)"

This reverts commit fbef31f.

* Use glue's ability to randomly sample values when computing histograms and image statistics, and remove code to handle NaN values that caused the whole array to be loaded into memory.

* Use order='K' in ravel() to avoid copy

* Avoid calling ravel() since this causes a copy for cutouts

* Add comment to explain percentile values

* Fix issue when array len() match but .shape doesn't

* Adjust reference values for test

* Bumped minimum required version of glue-core and glue-jupyter

* Fix case where data is not a Numpy array

* Add change log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imviz performance Performance related plugin Label for plugins common to multiple configurations Ready for final review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants