[BEAM-7034] Add example snippet to read fromQuery using BQ Storage API. #13083

fpopic · 2020-10-13T10:46:35Z

Jira ticket was resolved but the docs haven't been updated accordingly with a snippet.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang	Dataflow	Samza	Twister2
Go	---	---	---
Java
Python		---	---
XLang	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website	Whitespace	Typescript
Non-portable
Portable	---		---	---	---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

fpopic · 2020-10-13T11:02:16Z

Should we add a note about the pricing?

Does using BigQuery Storage API + fromQuery combines the cost:

= BigQuery query pricing + BigQuery Storage API pricing on top?

fpopic · 2020-10-21T18:23:55Z

R: @kennknowles

aaltay · 2020-11-12T23:48:04Z

R: @kmjung @chamikaramj

kmjung · 2020-11-12T23:54:07Z

cc: @vachan-shetty

kmjung · 2020-11-12T23:56:34Z

...xamples/snippets/transforms/io/gcp/bigquery/BigQueryReadFromQueryWithBigQueryStorageAPI.java

+        pipeline
+            .apply(
+                "Read from BigQuery table",
+                BigQueryIO.readTableRows()


I would avoid using readTableRows in an example snippet, both for the storage API and also for the existing export-based model -- this involves a needless conversion from Avro to JSON, where customers should instead be able to consume the Avro GenericRecords directly.

Okay, agree. What would be prefered way to continue with this then?

Finish this PR with using TableRows to have all 3 read examples using the same undesired readTableRows() call

refactor this example only to use read<T>(SerializableFunction<SchemaAndRecord, T> f) as a part of this PR

refactor all 3 examples using the preferred read<T>(SerializableFunction<SchemaAndRecord, T> f)?

Reading from a table

Reading with a query string

Using the BigQuery Storage API

If you have the cycles, let's do (3). Otherwise, you can go ahead with (1) and I will take care of updating them when you're done.

Then let's merge this, and next week I can refactor all 3 examples.

kmjung · 2020-11-13T06:41:39Z

Also, re: the question above about pricing: the storage API is free when used to read anonymous tables (e.g. query results). Users pay only when scanning from a named table.

aaltay · 2020-11-19T23:57:38Z

@fpopic - Could you address the open comments?

fpopic · 2020-11-20T05:04:09Z

Also, re: the question above about pricing: the storage API is free when used to read anonymous tables (e.g. query results). Users pay only when scanning from a named table.

Let me understand on a small example.

Does it mean that for my existing named table myproject:mydataset.mytable with the following schema:

[
  {
    "mode": "NULLABLE",
    "name": "my_string_field_1",
    "type": "STRING"
    },
  {
    "mode": "NULLABLE",
    "name": "my_string_field_2",
    "type": "STRING"
  }
]

Option A
```
BigQueryIO
.read<T>(...)
.from("myproject.mydataset.mytable")
.withSelectedFields("my_string_field_1")
.withMethod(Method.DIRECT_READ))
```
Would only include the BigQuery Query scan cost of the field my_string_field_1 + Storage API scan cost for the field my_string_field_1?

Option B

BigQueryIO
.read<T>(...)
.fromQuery("SELECT my_string_field_1 || 'my_concat_business_logic_for_this_field' FROM `myproject.mydataset.mytable`")
.usingStandardSql()
.withMethod(Method.DIRECT_READ))

And here the cost would only include the BigQuery Query scan cost for the field my_string_field_1?

Or you are just saying that anonymous table scan

BigQueryIO
.read<T>(...)
.fromQuery("SELECT 'dummy' AS my_string_field_1")
.usingStandardSql()
.withMethod(Method.DIRECT_READ))

is free of the Storage API cost for the bytes of dummy bytes?

kmjung · 2020-11-20T17:17:39Z

In your examples above:

BigQueryIO
    .read<T>(...)
    .from("myproject.mydataset.mytable")
    .withSelectedFields("my_string_field_1")
    .withMethod(Method.DIRECT_READ))

This would incur only BigQuery storage API charges for the uncompressed size of the my_string_field_1 column (e.g. at $1.10/TiB). The BigQuery query engine isn't involved here, and so neither is the $5/TiB query cost.

BigQueryIO
    .read<T>(...)
    .fromQuery("SELECT my_string_field_1 || 'my_concat_business_logic_for_this_field' FROM `myproject.mydataset.mytable`")
    .usingStandardSql()
    .withMethod(Method.DIRECT_READ))

This is a BigQuery query -- it will be executed as a query job, the query results will be written to an anonymous table, and then Beam will use the storage API to read the results from the anonymous table. You'll pay the standard $5/TiB on-demand query cost here (unless you're using a BigQuery reservation), but there won't be any costs associated with the storage API usage in this case because the target is an anonymous table.

I think your last example sums things up correctly.

aaltay · 2021-01-14T19:56:00Z

Is this PR still active?

fpopic · 2021-01-15T17:59:10Z

Is this PR still active?

@kmjung we stopped here

kmjung · 2021-01-15T18:02:33Z

I thought the plan was to merge this PR and then proceed with the update to remove readTableRows. Can we proceed with that plan? cc: @vachan-shetty

aaltay · 2021-01-15T19:52:08Z

I thought the plan was to merge this PR and then proceed with the update to remove readTableRows. Can we proceed with that plan? cc: @vachan-shetty

If the plan is to merge this, could you:

Review and LGTM the PR
Check that all tests are passing.

kmjung · 2021-01-15T20:01:46Z

The PR looks good to me. I'm not a Beam repository owner and can't provide formal approval.

chamikaramj · 2021-01-15T20:22:46Z

Retest this please

chamikaramj · 2021-01-15T20:22:55Z

Thanks. Will merge after tests pass.

chamikaramj · 2021-01-19T17:07:54Z

Retest this please

aaltay · 2021-01-20T00:13:42Z

Looks like there are style (spotless) issues.

fpopic · 2021-01-23T19:22:33Z

Looks like there are style (spotless) issues.

Hi @aaltay, is there a way to locally run linter or whatever static check is failing, I am having a hard time figuring out what could be wrong without any log message in CI?

kennknowles · 2021-01-27T19:12:41Z

You can run ./gradlew spotlessApply and it will automatically fix it.

Add example snippets to read fromQuery using BQ Storage API.

01b7596

Make the query example consistent with the previous one for the table.

393bbde

fpopic marked this pull request as ready for review October 13, 2020 12:18

kmjung suggested changes Nov 12, 2020

View reviewed changes

Run spotelessApply.

3c15bf3

chamikaramj merged commit f24ebd3 into apache:master Jan 28, 2021

[BEAM-7034] Add example snippet to read fromQuery using BQ Storage API. #13083

[BEAM-7034] Add example snippet to read fromQuery using BQ Storage API. #13083

Uh oh!

Conversation

fpopic commented Oct 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

GitHub Actions Tests Status (on master branch)

Uh oh!

fpopic commented Oct 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fpopic commented Oct 21, 2020

Uh oh!

aaltay commented Nov 12, 2020

Uh oh!

kmjung commented Nov 12, 2020

Uh oh!

kmjung Nov 12, 2020

Choose a reason for hiding this comment

Uh oh!

fpopic Nov 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmjung Nov 20, 2020

Choose a reason for hiding this comment

Uh oh!

fpopic Nov 20, 2020

Choose a reason for hiding this comment

Uh oh!

kmjung commented Nov 13, 2020

Uh oh!

aaltay commented Nov 19, 2020

Uh oh!

fpopic commented Nov 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kmjung commented Nov 20, 2020

Uh oh!

aaltay commented Jan 14, 2021

Uh oh!

fpopic commented Jan 15, 2021

Uh oh!

kmjung commented Jan 15, 2021

Uh oh!

aaltay commented Jan 15, 2021

Uh oh!

kmjung commented Jan 15, 2021

Uh oh!

chamikaramj commented Jan 15, 2021

Uh oh!

chamikaramj commented Jan 15, 2021

Uh oh!

chamikaramj commented Jan 19, 2021

Uh oh!

aaltay commented Jan 20, 2021

Uh oh!

fpopic commented Jan 23, 2021

Uh oh!

kennknowles commented Jan 27, 2021

Uh oh!

Uh oh!

fpopic commented Oct 13, 2020 •

edited

Loading

fpopic commented Oct 13, 2020 •

edited

Loading

fpopic Nov 20, 2020 •

edited

Loading

fpopic commented Nov 20, 2020 •

edited

Loading