Add workaround so dask arrays are optimized in Delayed writing #3082

djhoese · 2025-03-18T02:05:46Z

This speeds up writers like "simple_image" significantly.

This is the simplest workaround I could think of that works for all cases. This addresses the problem described in my dask discourse discussion post here.

Essentially we're saying "I know this is a Delayed object, but look at the task graph as a series of Array operations, not basic Delayed tasks". This means more complex graph optimizations are performed by dask rather than only the simple culling style optimization done for Delayed objects.

Note: This does not fix the rare issue caused by da.store pre-optimizing for Arrays and satpy users combining multiple writing (.save_datasets) calls into a single call using compute_writer_results. In some cases like this some Array tasks are re-computed because the pre-optimized tasks were renamed/merged/fused separately and dask thinks they are separate unique tasks. This comes down to dask/dask#8380 and dask/dask#9732 and my discussion post linked above and that they are not resolved upstream (yet 🤞 ).

As noted on slack, for a simple singe MODIS band to PNG this sped things up from 54s to 39s with only a slight but expected increase in memory as more tasks are computed in parallel and in bigger groups of operations...or at least I think that's why.

Closes #xxxx
Tests added
Fully documented
Add your name to AUTHORS.md if not there already

This speeds up writers like "simple_image" significantly.

codecov · 2025-03-18T02:12:27Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.14%. Comparing base (128bb9e) to head (21d1739).
Report is 23 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3082   +/-   ##
=======================================
  Coverage   96.14%   96.14%           
=======================================
  Files         383      383           
  Lines       55798    55800    +2     
=======================================
+ Hits        53649    53651    +2     
  Misses       2149     2149

Flag	Coverage Δ
behaviourtests	`3.88% <100.00%> (+<0.01%)`	⬆️
unittests	`96.24% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coveralls · 2025-03-18T02:25:57Z

Pull Request Test Coverage Report for Build 13913813285

Details

3 of 3 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 96.253%

Totals
Change from base Build 13645068765:	0.0%
Covered Lines:	53898
Relevant Lines:	55996

💛 - Coveralls

simonrp84 · 2025-03-18T08:28:07Z

For my own understanding, you say memory use has increased slightly due to more parallel tasks - does that mean that a user could, if needed, reduce memory use again by limiting the number of parallel tasks?

djhoese · 2025-03-18T11:26:43Z

Yes, but that would be done in the traditional way of limiting number oof dask workers or reducing chunk size. This PR is just making it more likely that more mini-tasks (a numpy function call) are happening during one merged task and using more memory. That's my best guess at least.

mraspaud

LGTM, makes sense

djhoese · 2025-04-08T19:12:01Z

FYI recent changes in dask should make this unnecessary. That said I was advised to replace Array -> Delayed type operations with equivalent array reductions (map_blocks, reduce, etc). I haven't gotten an answer yet to an exact replacement for some of the use cases in Satpy.

Add workaround so dask arrays are optimized in Delayed writing

21d1739

This speeds up writers like "simple_image" significantly.

djhoese added bug component:writers optimization labels Mar 18, 2025

djhoese requested a review from mraspaud as a code owner March 18, 2025 02:05

mraspaud approved these changes Mar 18, 2025

View reviewed changes

mraspaud merged commit cfba6af into pytroll:main Mar 18, 2025
18 checks passed

mraspaud assigned djhoese Mar 18, 2025

djhoese deleted the feat-workaround-dask-delayed-opt branch April 8, 2025 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add workaround so dask arrays are optimized in Delayed writing #3082

Add workaround so dask arrays are optimized in Delayed writing #3082

djhoese commented Mar 18, 2025

codecov bot commented Mar 18, 2025 •

edited

Loading

coveralls commented Mar 18, 2025

simonrp84 commented Mar 18, 2025

djhoese commented Mar 18, 2025

mraspaud left a comment

djhoese commented Apr 8, 2025

Add workaround so dask arrays are optimized in Delayed writing #3082

Add workaround so dask arrays are optimized in Delayed writing #3082

Conversation

djhoese commented Mar 18, 2025

codecov bot commented Mar 18, 2025 • edited Loading

Codecov Report

coveralls commented Mar 18, 2025

Pull Request Test Coverage Report for Build 13913813285

Details

💛 - Coveralls

simonrp84 commented Mar 18, 2025

djhoese commented Mar 18, 2025

mraspaud left a comment

Choose a reason for hiding this comment

djhoese commented Apr 8, 2025

codecov bot commented Mar 18, 2025 •

edited

Loading