Skip to content

Commit

Permalink
[KED-1876] Allow GBQTableDataSet to optionally accept a sql query to …
Browse files Browse the repository at this point in the history
…load data (#443)
  • Loading branch information
ajb7 authored Jul 29, 2020
1 parent 9e6dfbf commit d6291dc
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 4 deletions.
4 changes: 2 additions & 2 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@
## Major features and improvements

## Bug fixes and other changes
* Modified `GBQTableDataSet` to load customized results using customized queries from Google Big Query tables.
* Documentation improvements

## Breaking changes to the API

## Thanks for supporting contributions
[Vijay Sajjanar](https://github.com/vjkr), [Deepyaman Datta](https://github.com/deepyaman), [Sebastian Bertoli](https://github.com/sebastianbertoli), [Shahil Mawjee](https://github.com/s-mawjee)

[Ajay Bisht](https://github.com/ajb7), [Vijay Sajjanar](https://github.com/vjkr), [Deepyaman Datta](https://github.com/deepyaman), [Sebastian Bertoli](https://github.com/sebastianbertoli), [Shahil Mawjee](https://github.com/s-mawjee)

# Release 0.16.3

Expand Down
2 changes: 1 addition & 1 deletion kedro/extras/datasets/pandas/gbq_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,8 +137,8 @@ def _describe(self) -> Dict[str, Any]:

def _load(self) -> pd.DataFrame:
sql = "select * from {}.{}".format(self._dataset, self._table_name) # nosec
self._load_args.setdefault("query", sql)
return pd.read_gbq(
sql,
project_id=self._project_id,
credentials=self._credentials,
**self._load_args
Expand Down
17 changes: 16 additions & 1 deletion tests/extras/datasets/pandas/test_gbq_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,10 +150,25 @@ def test_save_load_data(self, gbq_dataset, dummy_dataframe, mocker):
table_id, project_id=PROJECT, credentials=None, progress_bar=False
)
mocked_read_gbq.assert_called_once_with(
sql, project_id=PROJECT, credentials=None
project_id=PROJECT, credentials=None, query=sql
)
assert_frame_equal(dummy_dataframe, loaded_data)

@pytest.mark.parametrize("load_args", [{"query": "Select 1"}], indirect=True)
def test_read_gbq_with_query(self, gbq_dataset, dummy_dataframe, mocker, load_args):
"""Test loading data set with query in the argument."""
mocked_read_gbq = mocker.patch(
"kedro.extras.datasets.pandas.gbq_dataset.pd.read_gbq"
)
mocked_read_gbq.return_value = dummy_dataframe
loaded_data = gbq_dataset.load()

mocked_read_gbq.assert_called_once_with(
project_id=PROJECT, credentials=None, query=load_args["query"]
)

assert_frame_equal(dummy_dataframe, loaded_data)

@pytest.mark.parametrize(
"dataset,table_name",
[
Expand Down

0 comments on commit d6291dc

Please sign in to comment.