Skip to content

[Task]: Data Sampling #25064

@rohdesamuel

Description

@rohdesamuel

What needs to happen?

This issue is to track adding a data sampling feature to the SDKs as discussed in the associated slide deck. This adds:

  • The Sample instruction (SampleRequest and SampleResponse) to query the samples from an SDK
  • The "beam:protocol:data_sampling:v1" capability to track which SDKs can sample
  • The "enable_data_sampling" experiment to toggle the feature
  • Automatic sampling in the Python, Java, and Go SDKs for running PTransforms

Issue Priority

Priority: 3 (nice-to-have improvement)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions