BigQuery: Add list rows and --max_results option to %%bigquery magic #9147

shubha-rajan · 2019-08-29T22:57:25Z

See #9105. Creating draft for visibility.

Adds option to pass a table id instead of a SQL query to %%bigquery cell magic as a cost-saving alternative to SELECT * queries. --max_results option limits the number of rows read. The returned pandas.DataFrame can be saved to a variable by passing a destination_var argument.

TODO:

Fix coverage failures
Test that running cell magic with table ID instead of query works with bqstorage_client set
Add tests for failure cases- handles failure cases when table IDs are passed instead of queries:
~~max_results is currently not working with regular SQL queries~~ - fixed! See screenshot below

To get this working, I ended up adding max_results as a property of QueryJobConfig, but if that wasn't the right call, I can refactor to pass max_results as a separate parameter.

…sing

… by notebook cell

…ge_api

bigquery/google/cloud/bigquery/job.py

plamut · 2019-08-30T21:07:40Z

bigquery/google/cloud/bigquery/job.py

-        :type api_response: dict
-        :param api_response: response returned from an API call
+        Args:
+            api_response (dict): response returned from an API call.


By the way, the types are specified with names from the typing module, thus a dict should be named Dict, for example. Or Dict[key_type, value_type] if you also want to specify the dict content's type(s). Probably best to check some of the existing "modern" docstrings in the codebase to get a feel.

got it. The dict in question would be a nested API response so it would be okay to just name it Dict without specifying the content, right?

I suppose so, yes. If all keys are strings, Dict[str, Any] could be used, but just Dict with a meaningful description is fine, too.

shubha-rajan · 2019-08-30T21:55:25Z

failing snippets tests also fail locally on master, so they're probably unrelated to changes in this PR

…ailure

plamut · 2019-08-31T17:59:25Z

@shubha-rajan Indeed, that started occurring a day or two ago. The backend team has been informed about it, we are awaiting the ETA for the fix. If it's too long, we can temporarily disable the failing test as a workaround.

tswast · 2019-09-03T16:34:04Z

Since these are two different features, let's have them as two (possibly 3) separate PRs, starting with --max_results feature. That way they are more clearly identified as new features in the CHANGELOG when we release these features.

I'd prefer we find a different implementation for max_results. Note: list_rows accepts a max_results argument, and QueryJob.result calls the list_rows method. I think it would be appropriate to add a max_results argument to QueryJob.result.

Let's have 3 PRs in this order:

Add max_results=None argument to the QueryJob.result method.
Add a --max_results argument to the %%bigquery magic.
Add ability to pass in a table ID instead of a query to the %%bigquery magic.

shubha-rajan · 2019-09-03T16:44:50Z

@tswast sounds good. I'll close this one and submit 3 separate PRs

shubha-rajan added 3 commits August 27, 2019 23:21

added max_results flag and property to QueryJobConfig

9ca87b2

tests for setting max_results and using table_id instead of query pas…

718bb2e

…sing

adjusted regex whitespace check to account for trailing newline added…

5eb56ce

… by notebook cell

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Aug 29, 2019

shubha-rajan added 4 commits August 29, 2019 23:44

preserve value of max_results after QueryJob._set_properties called

904c31b

added tests for using --max_results with destination_var and bq_stora…

53d0b0b

…ge_api

blacken and lint

3706a0a

removed unused max_results parameter from client._get_query_results

0e9e032

plamut reviewed Aug 30, 2019

View reviewed changes

bigquery/google/cloud/bigquery/job.py Outdated Show resolved Hide resolved

shubha-rajan added 2 commits August 30, 2019 13:43

added error messaging and tests for failure case

4756ec3

reformatted docstrings

304c761

plamut reviewed Aug 30, 2019

View reviewed changes

shubha-rajan added 2 commits August 30, 2019 14:18

Fix docstring formatting

8d0781d

blacken/lint

b93b01d

shubha-rajan marked this pull request as ready for review August 30, 2019 21:55

shubha-rajan requested a review from a team August 30, 2019 21:55

shubha-rajan added 2 commits August 30, 2019 19:43

update error messaging test

32a1ebc

refactored error message display into its own method.fixed coverage f…

aceee81

…ailure

shubha-rajan closed this Sep 3, 2019

This was referenced Sep 3, 2019

BigQuery: Add 'max_results' param to 'QueryJob.result()'. #9167

Merged

BigQuery: add --max_results option to magic #9169

Merged

BigQuery: Add ability to pass in a table ID instead of a query to the %%bigquery magic. #9170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery: Add list rows and --max_results option to %%bigquery magic #9147

BigQuery: Add list rows and --max_results option to %%bigquery magic #9147

shubha-rajan commented Aug 29, 2019 •

edited

Loading

plamut Aug 30, 2019 •

edited

Loading

shubha-rajan Aug 30, 2019

plamut Sep 1, 2019

shubha-rajan commented Aug 30, 2019

plamut commented Aug 31, 2019

tswast commented Sep 3, 2019

shubha-rajan commented Sep 3, 2019

BigQuery: Add list rows and --max_results option to %%bigquery magic #9147

BigQuery: Add list rows and --max_results option to %%bigquery magic #9147

Conversation

shubha-rajan commented Aug 29, 2019 • edited Loading

plamut Aug 30, 2019 • edited Loading

Choose a reason for hiding this comment

shubha-rajan Aug 30, 2019

Choose a reason for hiding this comment

plamut Sep 1, 2019

Choose a reason for hiding this comment

shubha-rajan commented Aug 30, 2019

plamut commented Aug 31, 2019

tswast commented Sep 3, 2019

shubha-rajan commented Sep 3, 2019

shubha-rajan commented Aug 29, 2019 •

edited

Loading

plamut Aug 30, 2019 •

edited

Loading