-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
What would you like to happen?
Goal: Enhance support for Batch Stateful DoFns within the SDK harness to improve performance and efficiency.
Objective: This Key Result tracks the implementation of an optimization to "Omit state commits" for Batch Stateful DoFns where applicable.
Problem: Processing stateful DoFns in batch pipelines can incur overhead related to state management (e.g., reading, writing, committing state). For certain batch use cases, such as those that are key-shuffled or do not actually leverage state, these operations might be unnecessary and add to processing time and resource usage.
Solution: This KR involves propagating 'has_no_state' and 'only_bundle_for_keys' bits of ProcessBundleRequest from ProcessBundleHandler to where StateRequest(StateAppendRequest and StateGetRequest) is constructed. If 'has_no_state' is true, StateGetRequest can be skipped. If 'only_bundle_for_keys' is true, StateAppendRequest can be skipped
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner