-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[Data] [2/n]Make LogicalPlan stateless and comparable #60362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Data] [2/n]Make LogicalPlan stateless and comparable #60362
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a large-scale refactoring to make logical operators stateless and comparable by converting them to frozen dataclasses. The changes are consistent and well-executed, replacing mutable patterns with immutable ones using dataclasses.replace. This is a great improvement for the robustness of the logical plan representation. I've found one critical issue with the initialization pattern used for the new dataclasses that could lead to partially initialized objects.
python/ray/data/_internal/logical/interfaces/logical_operator.py
Outdated
Show resolved
Hide resolved
e8b4a51 to
d453410
Compare
|
re-trigger ci just note: local ut success, and CI failure might be due to environment flakiness |
|
ut re-trigger ci. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
python/ray/data/_internal/logical/operators/input_data_operator.py
Outdated
Show resolved
Hide resolved
| _name: str | ||
| _input_dependencies: Tuple["LogicalOperator", ...] | ||
| _num_outputs: Optional[int] | ||
|
|
||
| def __init__( | ||
| self, | ||
| name: str, | ||
| input_dependencies: List["LogicalOperator"], | ||
| name: Optional[str] = None, | ||
| input_dependencies: Optional[List["LogicalOperator"]] = None, | ||
| num_outputs: Optional[int] = None, | ||
| **kwargs: Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than introducing this boilerplate where the dataclass fields are prefixed with an underscore and the constructor parameters aren't, can we (in maybe an isolated PR) rename the attributes to match the parameters (i.e., remove the underscores)?
Signed-off-by: yaommen <myanstu@163.com>
Signed-off-by: yaommen <myanstu@163.com>
0970212 to
0e4a56f
Compare
Change Summary — for easier review
|
Description
Issue goal:
Make LogicalOperator immutable and comparable to avoid cross-dataset contamination.
The issue is split into two PRs for easier review.
This PR focuses on:
converting the LogicalOperator hierarchy to frozen dataclasses and making updates immutable via dataclasses.replace.
Related issues
Additional information
Implementation details
@dataclass(frozen=True);with_per_block_limit,with_detected_parallelism);dataclasses.replace;__repr__/dag_strare preserved to keep explain output stable;_batch_format;Read.detected_parallelismis propagated immutably and the physical planning stage sets the input factory without mutating the logical op.API changes
logical operators are now immutable; external behavior unchanged; logical layer no longer maintains
_output_dependencies.Tests:
logical optimization, planning, and operator-fusion related tests were exercised (see test notes).