Open
Description
Issue Summary
I would like to follow progress of query downloads.
Currently I am doing:
query_result: QueryJob = client.query(query)
df = query_result.result().to_dataframe(
progress_bar_type="tqdm"
)
But this only supports stdout.
I would like to have some kind of mechanism to follow progress when stdout is not available.
Possible Solution 1 - Logs
A minimal solution could be to add a log statement into the code.
Maybe this would work:
Line 1819
of google/cloud/bigquery/table.py
try:
progress_bar = get_progress_bar(
progress_bar_type, "Downloading", self.total_rows, "rows"
)
record_batches = []
for record_batch in self.to_arrow_iterable(
bqstorage_client=bqstorage_client
):
record_batches.append(record_batch)
# NEW LINE
logger.debug("Downloaded data", completed=record_batch.num_rows, total_items=progress_bar.total or self.total_rows)
Possible Solution 2 - Callback
A better solution would be to add a call-back function to the to_dataframe
function, like this:
def log_progress(completed_items: int, total_items:int) -> None:
# This lets me do whatever I want here :)
logger.debug("Downloaded data", completed=completed_items, total_items=total_items)
query_result: QueryJob = client.query(query)
df = query_result.result().to_dataframe(
progress_callback=log_progress
)