Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Add list rows and --max_results option to %%bigquery magic #9147

Closed

Conversation

shubha-rajan
Copy link
Contributor

@shubha-rajan shubha-rajan commented Aug 29, 2019

See #9105. Creating draft for visibility.

Adds option to pass a table id instead of a SQL query to %%bigquery cell magic as a cost-saving alternative to SELECT * queries. --max_results option limits the number of rows read. The returned pandas.DataFrame can be saved to a variable by passing a destination_var argument.

image

TODO:

  • Fix coverage failures

  • Test that running cell magic with table ID instead of query works with bqstorage_client set

  • Add tests for failure cases- handles failure cases when table IDs are passed instead of queries:
    image

  • max_results is currently not working with regular SQL queries - fixed! See screenshot below
    image

To get this working, I ended up adding max_results as a property of QueryJobConfig, but if that wasn't the right call, I can refactor to pass max_results as a separate parameter.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Aug 29, 2019
:type api_response: dict
:param api_response: response returned from an API call
Args:
api_response (dict): response returned from an API call.
Copy link
Contributor

@plamut plamut Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, the types are specified with names from the typing module, thus a dict should be named Dict, for example. Or Dict[key_type, value_type] if you also want to specify the dict content's type(s). Probably best to check some of the existing "modern" docstrings in the codebase to get a feel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. The dict in question would be a nested API response so it would be okay to just name it Dict without specifying the content, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose so, yes. If all keys are strings, Dict[str, Any] could be used, but just Dict with a meaningful description is fine, too.

@shubha-rajan
Copy link
Contributor Author

failing snippets tests also fail locally on master, so they're probably unrelated to changes in this PR

@shubha-rajan shubha-rajan marked this pull request as ready for review August 30, 2019 21:55
@shubha-rajan shubha-rajan requested a review from a team August 30, 2019 21:55
@plamut
Copy link
Contributor

plamut commented Aug 31, 2019

@shubha-rajan Indeed, that started occurring a day or two ago. The backend team has been informed about it, we are awaiting the ETA for the fix. If it's too long, we can temporarily disable the failing test as a workaround.

@tswast
Copy link
Contributor

tswast commented Sep 3, 2019

Since these are two different features, let's have them as two (possibly 3) separate PRs, starting with --max_results feature. That way they are more clearly identified as new features in the CHANGELOG when we release these features.

I'd prefer we find a different implementation for max_results. Note: list_rows accepts a max_results argument, and QueryJob.result calls the list_rows method. I think it would be appropriate to add a max_results argument to QueryJob.result.

Let's have 3 PRs in this order:

  1. Add max_results=None argument to the QueryJob.result method.
  2. Add a --max_results argument to the %%bigquery magic.
  3. Add ability to pass in a table ID instead of a query to the %%bigquery magic.

@shubha-rajan
Copy link
Contributor Author

@tswast sounds good. I'll close this one and submit 3 separate PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants