Skip to content

Conversation

@gjoseph92
Copy link
Collaborator

Reduce memory spikes during data transfer with test_rebalance, by using NumPy arrays (zero-copy) and more, smaller keys.

Reduce memory spikes during data transfer with `test_rebalance`, by using NumPy arrays (zero-copy) and more, smaller keys.

Fixes dask#5688
@gjoseph92
Copy link
Collaborator Author

@crusaderky this still failed once: https://github.com/dask/distributed/runs/4927021521?check_suite_focus=true#step:12:1626

>       assert 30 <= ndata[a] <= 70
E       assert 80 <= 70

But in stderr for that test, there's no longer anything about unmanaged memory use being high, or the worker pausing or restarting. So is it possible that the rebalance heuristics are just different for 100 small keys vs 10 bigger ones, and the 30-70 numbers should change? I just multiplied your original 3-7 by 10, no idea if those targets still make sense.

@gjoseph92 gjoseph92 self-assigned this Jan 24, 2022
@gjoseph92 gjoseph92 added the flaky test Intermittent failures on CI. label Jan 24, 2022
@gjoseph92
Copy link
Collaborator Author

Note that with #5695, this may not be necessary. However, the fact that it was so sensitive to a 10mb change in worker memory means it probably should be made less sensitive anyway.

@crusaderky
Copy link
Collaborator

Note that with #5695, this may not be necessary. However, the fact that it was so sensitive to a 10mb change in worker memory means it probably should be made less sensitive anyway.

Indeed. I think the whole suite requires a more intensive facelift. See #5697.

@crusaderky
Copy link
Collaborator

Superseded by #5697

@crusaderky crusaderky closed this Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flaky test Intermittent failures on CI.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression: P2P shuffle skeleton (#5520) causes test flakiness

2 participants