Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: BigQueryIO storage write api streaming dynamic destination conflicts if multiple transforms set same destination key #32335

Open
1 of 17 tasks
Abacn opened this issue Aug 27, 2024 · 0 comments

Comments

@Abacn
Copy link
Contributor

Abacn commented Aug 27, 2024

What happened?

An edge case leading to data corruption:

For StorageApiWriteShardedREcords, We maintain a client pool via a static Map of key as DestinationT type: [1]

If there are multiple BigQueryIO.write both with dynamic destinations, and use the same keys, and get processed at the same time on single worker, the race condition could trigger, making rows writes to wrong table, and if schema mismatch, write fails and keep retrying

[1]

new AtomicReference<>(APPEND_CLIENTS.get(element.getKey(), getAppendClientInfo));

This can be mitigated if DynamicTestinations is guaranteed to return different destination for different tables to write. We should also document this clearly

Issue Priority

Priority: 3 (minor)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@Abacn Abacn changed the title [Bug]: BigQueryIO storage write api dynamic destination conflicts if multiple transforms set same destination key [Bug]: BigQueryIO storage write api streaming dynamic destination conflicts if multiple transforms set same destination key Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant