Skip to content

[Feature Request]: Batch Stateful DoFns support in UnifiedWorker: Omit state commits implemented #37690

@LEEKYE

Description

@LEEKYE

What would you like to happen?

Goal: Enhance support for Batch Stateful DoFns within the SDK harness to improve performance and efficiency.

Objective: This Key Result tracks the implementation of an optimization to "Omit state commits" for Batch Stateful DoFns where applicable.

Problem: Processing stateful DoFns in batch pipelines can incur overhead related to state management (e.g., reading, writing, committing state). For certain batch use cases, such as those that are key-shuffled or do not actually leverage state, these operations might be unnecessary and add to processing time and resource usage.

Solution: This KR involves propagating 'has_no_state' and 'only_bundle_for_keys' bits of ProcessBundleRequest from ProcessBundleHandler to where StateRequest(StateAppendRequest and StateGetRequest) is constructed. If 'has_no_state' is true, StateGetRequest can be skipped. If 'only_bundle_for_keys' is true, StateAppendRequest can be skipped

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions