-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add columns argument to read_gbq #15
Conversation
@bnaul I think it might have ended up stripped out when we removed Currently, the PRs don't have the full CI running due to a limitation on the Github action that we are using to authenticate (it only runs on push). I'm not sure how to handle this, pinging @jrbourbeau for ideas. |
oops yep totally missed that. the |
@bnaul I just gave you push access to the project, so we can run the full CI. Would you mind pushing all the PRs to separate branches so the CI runs, and then we can merge from there? |
@ncclementi just pushed, I can't change the fork for this PR but it looks like pushing the same commit to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bnaul! Totally agree column selection is something we should support
@@ -82,3 +82,16 @@ def test_read_kwargs(dataset, client): | |||
|
|||
with pytest.raises(Exception, match="504 Deadline Exceeded"): | |||
ddf.compute() | |||
|
|||
|
|||
def test_read_columns(df, dataset, client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this test looks great. One small suggestions: could we add
assert df.shape[1] > 1
to ensure that the original DataFrame
has more than one column? Otherwise, in the future the example DataFrame
could be updated to only have a single "name"
column and this test would still pass (this is unlikely, but possible)
dask_bigquery/core.py
Outdated
@@ -2,6 +2,7 @@ | |||
|
|||
from contextlib import contextmanager | |||
from functools import partial | |||
from typing import List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I'm by no means a type annotation expert, but from reviewing dask/distributed#5328, I think the recommend approach for a list would be to add from __future__ import annotations
and then just use the built-in list
in the type annotation (i.e. list[str]
). This isn't meant to be a blocking comment as what you have here is valid. Just meant as an FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah, good call..I'm used to list[]
for 3.9 but forgot about the __future__
option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bnaul! This is in
Seems like a pretty important piece of functionality for a columnar database 🙂 I think maybe it just got dropped in the transition between #1 and #4? Hopefully there's not something I'm missing that makes it more complicated than it was before...
One small note,
dd.read_parquet
which seems most analogous allows a single string column name in which case you get back add.Series
instead. That seemed a little too magical to me so I did not include it, if we think aligning withdd.read_parquet
is important then it'd be easy to add.