BigQuery IO Source is not Exporting to GCS as written in documentation #19174

kennknowles · 2022-06-03T22:52:28Z

Did some check on the beam code and find out that DataFlow is querying BigQuery and retrieve the result using pagination [1]. As per our understanding, this means no parallelism on reading BigQuery table. It is contradictory to what the documentation is telling us [2].

Is this some kind of work in progress? I'm filing as a bug since documentation telling me that it is using GCS meanwhile it's using NativeSourceReader which yield data per row as iterator.

[1]

beam/sdks/python/apache_beam/io/gcp/bigquery.py

Line 1083 in 520b3a2

while True:

[2]

beam/sdks/python/apache_beam/io/gcp/bigquery.py

Line 60 in 520b3a2

The main and side inputs are implemented differently. Reading a BigQuery table

Imported from Jira BEAM-5352. Original Jira may contain additional context.
Reported by: rendybjunior.

rendybjunior · 2022-06-03T22:55:50Z

Thanks for migrating the issue.

kennknowles added bug gcp P3 sdk-py-core bigquery labels Jun 3, 2022

damccorm added core py python and removed sdk-py-core labels Jun 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery IO Source is not Exporting to GCS as written in documentation #19174

BigQuery IO Source is not Exporting to GCS as written in documentation #19174

kennknowles commented Jun 3, 2022

rendybjunior commented Jun 3, 2022

BigQuery IO Source is not Exporting to GCS as written in documentation #19174

BigQuery IO Source is not Exporting to GCS as written in documentation #19174

Comments

kennknowles commented Jun 3, 2022

rendybjunior commented Jun 3, 2022