-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[Data] Added tracking of block serialization time #60574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Updated `task_completion_time_excl_backpressure_s` to include block serde time Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
…des serde time; Updated tests; Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces tracking for block serialization time, which is a valuable addition for performance monitoring in Ray Data. The core change involves modifying the generator execution flow in _raylet.pyx to feed back serialization duration to the caller using gen.send(). This new metric is then integrated throughout the data stack, including BlockExecStats, OpRuntimeMetrics, and relevant operators like MapOperator and HashShuffleOperator. The tests have also been updated to cover these new metrics.
The implementation is clean and effective. The use of yield expressions to pass data back into generators is a good pattern for this use case. I have one minor suggestion to improve code clarity.
| def udf_time_s(self, reset: bool) -> float: | ||
| cur_time_s = self._udf_time_s | ||
| self._udf_time_s = 0 | ||
| return cur_time_s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reset parameter in udf_time_s is not used in the function body; the timer self._udf_time_s is always reset to 0. Since the only call site in map_operator.py passes reset=True, the behavior is correct for now. However, to improve clarity and prevent potential misuse in the future, I suggest removing the reset parameter from the method signature and updating the call in map_operator.py:772 to map_transformer.udf_time_s().
| def udf_time_s(self, reset: bool) -> float: | |
| cur_time_s = self._udf_time_s | |
| self._udf_time_s = 0 | |
| return cur_time_s | |
| def udf_time_s(self) -> float: | |
| cur_time_s = self._udf_time_s | |
| self._udf_time_s = 0 | |
| return cur_time_s |
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
edoakes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From offline discussion, core change looks ok to me. Let's consider it a private API for now. I will follow up to add a disclaimer & context in code comments.
Defer to others to review the data changes. You may want to consider an alternative name to "serialization time" since it is not purely serialization, but also includes memory allocation & copy time. Perhaps "block_write_time_s" or "block_output_time_s"
Changes
task_completion_time_excl_backpressure_sto track both UDF block generation time AND block serialization overheadRelated issues
Additional information