You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For StorageApiWriteShardedREcords, We maintain a client pool via a static Map of key as DestinationT type: [1]
If there are multiple BigQueryIO.write both with dynamic destinations, and use the same keys, and get processed at the same time on single worker, the race condition could trigger, making rows writes to wrong table, and if schema mismatch, write fails and keep retrying
This can be mitigated if DynamicTestinations is guaranteed to return different destination for different tables to write. We should also document this clearly
Issue Priority
Priority: 3 (minor)
Issue Components
Component: Python SDK
Component: Java SDK
Component: Go SDK
Component: Typescript SDK
Component: IO connector
Component: Beam YAML
Component: Beam examples
Component: Beam playground
Component: Beam katas
Component: Website
Component: Infrastructure
Component: Spark Runner
Component: Flink Runner
Component: Samza Runner
Component: Twister2 Runner
Component: Hazelcast Jet Runner
Component: Google Cloud Dataflow Runner
The text was updated successfully, but these errors were encountered:
Abacn
changed the title
[Bug]: BigQueryIO storage write api dynamic destination conflicts if multiple transforms set same destination key
[Bug]: BigQueryIO storage write api streaming dynamic destination conflicts if multiple transforms set same destination key
Aug 27, 2024
What happened?
An edge case leading to data corruption:
For StorageApiWriteShardedREcords, We maintain a client pool via a static Map of key as DestinationT type: [1]
If there are multiple BigQueryIO.write both with dynamic destinations, and use the same keys, and get processed at the same time on single worker, the race condition could trigger, making rows writes to wrong table, and if schema mismatch, write fails and keep retrying
[1]
beam/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java
Line 551 in 028e0ee
This can be mitigated if DynamicTestinations is guaranteed to return different destination for different tables to write. We should also document this clearly
Issue Priority
Priority: 3 (minor)
Issue Components
The text was updated successfully, but these errors were encountered: