You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace execution_mode with emission_type and boundedness (#13823)
* feat: update execution modes and add bitflags dependency
- Introduced `Incremental` execution mode alongside existing modes in the DataFusion execution plan.
- Updated various execution plans to utilize the new `Incremental` mode where applicable, enhancing streaming capabilities.
- Added `bitflags` dependency to `Cargo.toml` for better management of execution modes.
- Adjusted execution mode handling in multiple files to ensure compatibility with the new structure.
* add exec API
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
* replace done but has stackoverflow
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
* exec API done
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
* Refactor execution plan properties to remove execution mode
- Removed the `ExecutionMode` parameter from `PlanProperties` across multiple physical plan implementations.
- Updated related functions to utilize the new structure, ensuring compatibility with the changes.
- Adjusted comments and cleaned up imports to reflect the removal of execution mode handling.
This refactor simplifies the execution plan properties and enhances maintainability.
* Refactor execution plan to remove `ExecutionMode` and introduce `EmissionType`
- Removed the `ExecutionMode` parameter from `PlanProperties` and related implementations across multiple files.
- Introduced `EmissionType` to better represent the output characteristics of execution plans.
- Updated functions and tests to reflect the new structure, ensuring compatibility and enhancing maintainability.
- Cleaned up imports and adjusted comments accordingly.
This refactor simplifies the execution plan properties and improves the clarity of memory handling in execution plans.
* fix test
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
* Refactor join handling and emission type logic
- Updated test cases in `sanity_checker.rs` to reflect changes in expected outcomes for bounded and unbounded joins, ensuring accurate test coverage.
- Simplified the `is_pipeline_breaking` method in `execution_plan.rs` to clarify the conditions under which a plan is considered pipeline-breaking.
- Enhanced the emission type determination logic in `execution_plan.rs` to prioritize `Final` over `Both` and `Incremental`, improving clarity in execution plan behavior.
- Adjusted join type handling in `hash_join.rs` to classify `Right` joins as `Incremental`, allowing for immediate row emission.
These changes improve the accuracy of tests and the clarity of execution plan properties.
* Implement emission type for execution plans
- Updated multiple execution plan implementations to replace `unimplemented!()` with `EmissionType::Incremental`, ensuring that the emission type is correctly defined for various plans.
- This change enhances the clarity and functionality of the execution plans by explicitly specifying their emission behavior.
These updates contribute to a more robust execution plan framework within the DataFusion project.
* Enhance join type documentation and refine emission type logic
- Updated the `JoinType` enum in `join_type.rs` to include detailed descriptions for each join type, improving clarity on their behavior and expected results.
- Modified the emission type logic in `hash_join.rs` to ensure that `Right` and `RightAnti` joins are classified as `Incremental`, allowing for immediate row emission when applicable.
These changes improve the documentation and functionality of join operations within the DataFusion project.
* Refactor emission type logic in join and sort execution plans
- Updated the emission type determination in `SortMergeJoinExec` and `SymmetricHashJoinExec` to utilize the `emission_type_from_children` function, enhancing the accuracy of emission behavior based on input characteristics.
- Clarified comments in `sort.rs` regarding the conditions under which results are emitted, emphasizing the relationship between input sorting and emission type.
- These changes improve the clarity and functionality of the execution plans within the DataFusion project, ensuring more robust handling of emission types.
* Refactor emission type handling in execution plans
- Updated the `emission_type_from_children` function to accept an iterator instead of a slice, enhancing flexibility in how child execution plans are passed.
- Modified the `SymmetricHashJoinExec` implementation to utilize the new function signature, improving code clarity and maintainability.
These changes streamline the emission type determination process within the DataFusion project, contributing to a more robust execution plan framework.
* Enhance execution plan properties with boundedness and emission type
- Introduced `boundedness` and `pipeline_behavior` methods to the `ExecutionPlanProperties` trait, improving the handling of execution plan characteristics.
- Updated the `CsvExec`, `SortExec`, and related implementations to utilize the new methods for determining boundedness and emission behavior.
- Refactored the `ensure_distribution` function to use the new boundedness logic, enhancing clarity in distribution decisions.
- These changes contribute to a more robust and maintainable execution plan framework within the DataFusion project.
* Refactor execution plans to enhance boundedness and emission type handling
- Updated multiple execution plan implementations to incorporate `Boundedness` and `EmissionType`, improving the clarity and functionality of execution plans.
- Replaced instances of `unimplemented!()` with appropriate emission types, ensuring that plans correctly define their output behavior.
- Refactored the `PlanProperties` structure to utilize the new boundedness logic, enhancing decision-making in execution plans.
- These changes contribute to a more robust and maintainable execution plan framework within the DataFusion project.
* Refactor memory handling in execution plans
- Updated the condition for checking memory requirements in execution plans from `has_finite_memory()` to `boundedness().requires_finite_memory()`, improving clarity in memory management.
- This change enhances the robustness of execution plans within the DataFusion project by ensuring more accurate assessments of memory constraints.
* Refactor boundedness checks in execution plans
- Updated conditions for checking boundedness in various execution plans to use `is_unbounded()` instead of `requires_finite_memory()`, enhancing clarity in memory management.
- Adjusted the `PlanProperties` structure to reflect these changes, ensuring more accurate assessments of memory constraints across the DataFusion project.
- These modifications contribute to a more robust and maintainable execution plan framework, improving the handling of boundedness in execution strategies.
* Remove TODO comment regarding unbounded execution plans in `UnboundedExec` implementation
- Eliminated the outdated comment suggesting a switch to unbounded execution with finite memory, streamlining the code and improving clarity.
- This change contributes to a cleaner and more maintainable codebase within the DataFusion project.
* Refactor execution plan boundedness and emission type handling
- Updated the `is_pipeline_breaking` method to use `requires_finite_memory()` for improved clarity in determining pipeline behavior.
- Enhanced the `Boundedness` enum to include detailed documentation on memory requirements for unbounded streams.
- Refactored `compute_properties` methods in `GlobalLimitExec` and `LocalLimitExec` to directly use the input's boundedness, simplifying the logic.
- Adjusted emission type determination in `NestedLoopJoinExec` to utilize the `emission_type_from_children` function, ensuring accurate output behavior based on input characteristics.
These changes contribute to a more robust and maintainable execution plan framework within the DataFusion project, improving clarity and functionality in handling boundedness and emission types.
* Refactor emission type and boundedness handling in execution plans
- Removed the `OptionalEmissionType` struct from `plan_properties.rs`, simplifying the codebase.
- Updated the `is_pipeline_breaking` function in `execution_plan.rs` for improved readability by formatting the condition across multiple lines.
- Adjusted the `GlobalLimitExec` implementation in `limit.rs` to directly use the input's boundedness, enhancing clarity in memory management.
These changes contribute to a more streamlined and maintainable execution plan framework within the DataFusion project, improving the handling of emission types and boundedness.
* Refactor GlobalLimitExec and LocalLimitExec to enhance boundedness handling
- Updated the `compute_properties` methods in both `GlobalLimitExec` and `LocalLimitExec` to replace `EmissionType::Final` with `Boundedness::Bounded`, reflecting that limit operations always produce a finite number of rows.
- Changed the input's boundedness reference to `pipeline_behavior()` for improved clarity in execution plan properties.
These changes contribute to a more streamlined and maintainable execution plan framework within the DataFusion project, enhancing the handling of boundedness in limit operations.
* Review Part1
* Update sanity_checker.rs
* addressing reviews
* Review Part 1
* Update datafusion/physical-plan/src/execution_plan.rs
* Update datafusion/physical-plan/src/execution_plan.rs
* Shorten imports
* Enhance documentation for JoinType and Boundedness enums
- Improved descriptions for the Inner and Full join types in join_type.rs to clarify their behavior and examples.
- Added explanations regarding the boundedness of output streams and memory requirements in execution_plan.rs, including specific examples for operators like Median and Min/Max.
---------
Signed-off-by: Jay Zhan <jay.zhan@synnada.ai>
Co-authored-by: berkaysynnada <berkay.sahin@synnada.ai>
Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
0 commit comments