Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Potentially more Pythonic options for the API? #6055

Closed
max-sixty opened this issue Sep 21, 2018 · 5 comments
Closed

BigQuery: Potentially more Pythonic options for the API? #6055

max-sixty opened this issue Sep 21, 2018 · 5 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@max-sixty
Copy link

max-sixty commented Sep 21, 2018

I thought I'd discussed this with @tswast, and he'd even generously added a PR; but after ten mins of searching GH I've come up empty on both issues and PRs, so posting here.
@tswast lmk if my memory is correct and we have discussed this!

I appreciate API design is difficult and there are tradeoffs between verbosity vs explicitness, and consistency within a product vs language-specific adjustments. The bar should be high for people offering criticism when they can only see a subset of the relevant information.
But I do frequently use the BQ python API and find it fairly awkward and un-pythonic, as though I were writing Java.

Here's an example, straight from the docs:

from google.cloud import bigquery
client = bigquery.Client()
dataset_id = 'my_dataset'  # replace with your dataset ID
table_id = 'my_table'  # replace with your table ID
table_ref = client.dataset(dataset_id).table(table_id)
table = client.get_table(table_ref)  # API request

I would love to be able to write this, and receive a table object.

client = bigquery.Client()
client.table(dataset='ds', table='tbl')

There are some points a level down (the .table object doesn't return a Table object while a dataset object does return a Dataset object / do we need a TableReference class in place of a string, etc), but they all center around the ergonomics of the API, particularly in respect to Python.

Thank you as ever for a wonderful product, and appreciate any thoughts on whether I'm making mistakes here.

@JustinBeckwith JustinBeckwith added the triage me I really want to be triaged. label Sep 22, 2018
@tseaver tseaver added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: bigquerydatatransfer Issues related to the BigQuery Data Transfer Service API. and removed triage me I really want to be triaged. labels Sep 24, 2018
@tswast tswast added api: bigquery Issues related to the BigQuery API. and removed api: bigquerydatatransfer Issues related to the BigQuery Data Transfer Service API. labels Sep 24, 2018
@tswast
Copy link
Contributor

tswast commented Sep 24, 2018

In our previous conversation, I added a from_string() method to the DatasetReference and TableReference classes. #5255

You're right that the BigQuery client does still feel overly verbose. I find it a problem that from_string() can't account for the default project on the client.

The reason we have the *Reference classes is that the BigQuery API has such resources in the REST responses. I wanted to be super clear when the API only returns a pointer to a table rather than a full table resource.

It's a historical artifact that client.dataset() and dataset.table() return a reference and not an actual Dataset or Table. In retrospect, I probably should have renamed or removed those methods in the 1.0 redesign project.

An idea:

  • Whereever Client methods accept a *Reference, also allow a string. This gets a little muddy in the library code, but I think so long as it's documented well it'd be okay.

PRs welcome!

@tswast
Copy link
Contributor

tswast commented Sep 24, 2018

cc @shollyman @alixhami

@max-sixty
Copy link
Author

Hi Tim!

Thanks for your thoughtful reply, as ever! I had the sense there might be some influence from the REST design.

Whereever Client methods accept a *Reference, also allow a string. This gets a little muddy in the library code, but I think so long as it's documented well it'd be okay.

That makes a lot of sense. Particularly if this could be project[:.]dataset.table or dataset.table and fall back to the default project, then that could surmount the issue you point out with .from_string. I think that would accomplish most of the goals - it would be easy & intuitive to get tables, and users wouldn't need to learn the ref objects for 80% of cases. A step further would be to attempt to coerce strings / Tableref objects to Tables, though maybe that's a step too far.

I'm trying to focus on xarray & pandas-gbq re OSS work (and a touch of rust!) so while I'd love to tinker here, please don't wait for me on this one. I hope the comments are still productive even from the peanut gallery.

@yan-hic
Copy link

yan-hic commented Oct 2, 2018

Why not a step further and introducing default_dataset at Client()-level. It could propagate to similar property e.g. in QueryJobConfig(), or act as default wherever a dataset is required, including in .from_string if only table is passed...

@tswast
Copy link
Contributor

tswast commented Oct 11, 2018

@yiga2 I like the idea of a default dataset in principle, but there are so many places where a dataset might creep in that I fear it'd be hard to catch all uses, especially as the BigQuery API evolves and there may be new APIs that take datasets in their requests.

Instead, we are starting to offer a slightly lower-level API where default resources can be provided. See: #6088

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

5 participants