You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current implementation: GBQTableDataSet loads data from Google BigQuery. It uses pandas-gbq to read from BigQuery table. In extras/datasets/pandas/gbq_dataset.py
This works well when dataset is small to medium in size. However, when we have "Big Data" there is a need to get specific columns or specific rows or specific partition from the dataset; hence using custom queries with filters is required. The select * implementation in many ways, violates Best Practices to fetch data from GoogleBigQuery.
Context
To comply with the GBQ Best Practices in order to make the queries cost efficient as well as time efficient; GBQTableDataSet can empower developers to pass custom queries as arguments; instead of select * ; as in the current implementation.
Current implementation uses pd.read_gbq() to load data from Google Big Query. This function allows custom queries along with other arguments to be passed. The power of read_gbq() can be fully utilized by passing the allowed parameters of function as part of load_args in GBQTableDataSet.
mzjp2
changed the title
Add support for Custom Queries to run on Google BigQuery using GBQTableDataSet
[KED-1876] Add support for Custom Queries to run on Google BigQuery using GBQTableDataSet
Jul 23, 2020
Hey @ajb7, thanks for raising the issue - this is actually perfect. We've had similar feedback recently and have recently opened a ticket on our backlog to handle exactly this (we agree with all your comments) - we're super thankful to have you taking this up in the PR you've opened!
mzjp2
changed the title
[KED-1876] Add support for Custom Queries to run on Google BigQuery using GBQTableDataSet
[KED-1876] Add support for custom queries with GBQTableDataSet
Jul 23, 2020
Description
Current implementation: GBQTableDataSet loads data from Google BigQuery. It uses pandas-gbq to read from BigQuery table. In
extras/datasets/pandas/gbq_dataset.py
This works well when dataset is small to medium in size. However, when we have "Big Data" there is a need to get specific columns or specific rows or specific partition from the dataset; hence using custom queries with filters is required. The
select *
implementation in many ways, violates Best Practices to fetch data from GoogleBigQuery.Context
To comply with the GBQ Best Practices in order to make the queries cost efficient as well as time efficient; GBQTableDataSet can empower developers to pass custom queries as arguments; instead of
select *
; as in the current implementation.Current implementation uses
pd.read_gbq()
to load data from Google Big Query. This function allows custom queries along with other arguments to be passed. The power ofread_gbq()
can be fully utilized by passing the allowed parameters of function as part ofload_args
in GBQTableDataSet.As per documentation :
To pass custom queries to GBQTableDataSet,
catalog.yml
will look like:The text was updated successfully, but these errors were encountered: