Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-1876] Allow GBQTableDataSet to optionally accept a sql query to load data #443

4 changes: 2 additions & 2 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@
## Major features and improvements

## Bug fixes and other changes
* Modified `GBQTableDataSet` to load customized results using customized queries from Google Big Query tables.
* Documentation improvements

## Breaking changes to the API

## Thanks for supporting contributions
[Vijay Sajjanar](https://github.com/vjkr), [Deepyaman Datta](https://github.com/deepyaman), [Sebastian Bertoli](https://github.com/sebastianbertoli), [Shahil Mawjee](https://github.com/s-mawjee)

[Ajay Bisht](https://github.com/ajb7), [Vijay Sajjanar](https://github.com/vjkr), [Deepyaman Datta](https://github.com/deepyaman), [Sebastian Bertoli](https://github.com/sebastianbertoli), [Shahil Mawjee](https://github.com/s-mawjee)

# Release 0.16.3

Expand Down
2 changes: 1 addition & 1 deletion kedro/extras/datasets/pandas/gbq_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,8 +137,8 @@ def _describe(self) -> Dict[str, Any]:

def _load(self) -> pd.DataFrame:
sql = "select * from {}.{}".format(self._dataset, self._table_name) # nosec
self._load_args.setdefault("query", sql)
return pd.read_gbq(
sql,
project_id=self._project_id,
credentials=self._credentials,
**self._load_args
Expand Down
17 changes: 16 additions & 1 deletion tests/extras/datasets/pandas/test_gbq_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,10 +150,25 @@ def test_save_load_data(self, gbq_dataset, dummy_dataframe, mocker):
table_id, project_id=PROJECT, credentials=None, progress_bar=False
)
mocked_read_gbq.assert_called_once_with(
sql, project_id=PROJECT, credentials=None
project_id=PROJECT, credentials=None, query=sql
)
assert_frame_equal(dummy_dataframe, loaded_data)

@pytest.mark.parametrize("load_args", [{"query": "Select 1"}], indirect=True)
def test_read_gbq_with_query(self, gbq_dataset, dummy_dataframe, mocker, load_args):
"""Test loading data set with query in the argument."""
mocked_read_gbq = mocker.patch(
"kedro.extras.datasets.pandas.gbq_dataset.pd.read_gbq"
)
mocked_read_gbq.return_value = dummy_dataframe
loaded_data = gbq_dataset.load()

mocked_read_gbq.assert_called_once_with(
project_id=PROJECT, credentials=None, query=load_args["query"]
Copy link
Contributor

@mzjp2 mzjp2 Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d check query=“Select 1” explicitly here but up to you!

)

assert_frame_equal(dummy_dataframe, loaded_data)

@pytest.mark.parametrize(
"dataset,table_name",
[
Expand Down