Skip to content

Conversation

@fjetter
Copy link
Member

@fjetter fjetter commented Oct 5, 2021

In my latest dive into stealing code I investigated some of the logs and saw a lot of ridiculous steal requests. Task durations of ~10ms and occupancy differences between thief and victim of ~100ms.

Not only do we not care for such a difference but the act of stealing is guaranteed to be more expensive than letting things be.

Stealing requires at least three network bounces (steal-request, steal-confirm, compute-task) which includes code serialization if successful. It almost impossible to do this in the currently hard coded 1ms. The 100ms I propose are likely too conservative but I don't think this is necessarily a bad thing for stealing. I don't have time for large scale tests but am very confident that this should by much higher than it is right now. Thoughts, concerns?

cc @gjoseph92 @crusaderky

@fjetter
Copy link
Member Author

fjetter commented Oct 5, 2021

fwiw, I don't even consider it worth it to measure this properly. We are working with so many estimations in the stealing code that an accurate measurement of this offset is not worth it imho

@gjoseph92
Copy link
Collaborator

Frankly 0.1s doesn't even seem that conservative to me.

@fjetter fjetter changed the title Increase latency for stealing Increase latency overhead in stealing cost calculation Oct 6, 2021
@crusaderky crusaderky merged commit a8151a6 into dask:main Oct 19, 2021
zanieb pushed a commit to zanieb/distributed that referenced this pull request Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants