-
-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid deepcopy when submitting graph #8633
base: main
Are you sure you want to change the base?
Conversation
distributed/client.py
Outdated
@@ -3170,8 +3170,7 @@ def _graph_to_futures( | |||
self._send_to_scheduler( | |||
{ | |||
"op": "update-graph", | |||
"graph_header": header, | |||
"graph_frames": frames, | |||
"graph": Serialized(header, frames), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The important change is not to move from ToPickle
to Serialize
but rather to wrap the header+frames in the corresponding container. I suspected that Serialize is faster but haven't verified. For objects that properly implement pickle5 this shouldn't matter really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding an in-line comment explaining this to keep our future selves from removing this "excessively explicit" serialization step.
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 18 files - 9 18 suites - 9 14h 59m 31s ⏱️ + 6h 2m 58s For more details on these failures and errors, see this check. Results for commit a990563. ± Comparison against base commit 4986fa4. This pull request removes 2127 and adds 10 tests. Note that renamed tests count towards both.
This pull request removes 39 skipped tests and adds 2 skipped tests. Note that renamed tests count towards both.
This pull request skips 61 tests.
♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @fjetter! One suggestion, feel free to ignore.
distributed/client.py
Outdated
@@ -3170,8 +3170,7 @@ def _graph_to_futures( | |||
self._send_to_scheduler( | |||
{ | |||
"op": "update-graph", | |||
"graph_header": header, | |||
"graph_frames": frames, | |||
"graph": Serialized(header, frames), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding an in-line comment explaining this to keep our future selves from removing this "excessively explicit" serialization step.
What's the status of this PR? |
This avoids at least one deepcopy of the graph and therefore reduces overhead during submit
Typically, one would simply add
ToPickle/Serialize
to the dict value and pass the graph through directly. However, this would make it impossible to get clear error messages on (de-)serialization exceptions which is why this was pulled out to a manual call back then.However, passing the header and frames as dictionary values directly causes msgpack to simply copy the bytes into the msgpack bytestream instead of us passing through the frames implicitly.
This avoids an unnecessary copy