Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to_dataframe deterministicly update progress bar. #8303

Merged
merged 5 commits into from
Jun 18, 2019

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Jun 13, 2019

Previously, a background thread was used to collect progress bar updates
from worker threads. So as not to block downloads for progress bar
updates, put_nowait was used to make progress bar updates. Missed
writes to the progress bar were ignored. This caused non-deterministic
progress bar updates and test flakiness.

Now, worker threads push dataframes to the queue, and the return values
for download_dataframe_bqstorage and
download_dataframe_tabledata_list have been updated to return an
iterable of pandas DataFrame objects instead of a single DataFrame. This
allows progress bar updates to be done independently of which underlying
API is used to download the DataFrames.

Also, the logic for working with pandas has been moved to the
_pandas_helpers module.

Closes #8175

Previously, a background thread was used to collect progress bar updates
from worker threads. So as not to block downloads for progress bar
updates, `put_nowait` was used to make progress bar updates. Missed
writes to the progress bar were ignored. This caused non-deterministic
progress bar updates and test flakiness.

Now, worker threads push dataframes to the queue, and the return values
for `download_dataframe_bqstorage` and
`download_dataframe_tabledata_list` have been updated to return an
iterable of pandas DataFrame objects instead of a single DataFrame. This
allows progress bar updates to be done independently of which underlying
API is used to download the DataFrames.

Also, the logic for working with pandas has been moved to the
`_pandas_helpers` module.
@tswast tswast requested a review from a team June 13, 2019 23:01
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jun 13, 2019
@tswast tswast requested a review from tseaver June 13, 2019 23:33
@plamut plamut added api: bigquery Issues related to the BigQuery API. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Jun 14, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 14, 2019
Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes generally seem fine to me, but I did leave a few comments that might be relevant - please check, just in case.

Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and thanks for the additional explanation of the design decisions!

Will wait if the other reviewers have something else to add.

@tswast tswast merged commit fd24b4b into googleapis:master Jun 18, 2019
@tswast tswast deleted the issue8175-bqstorage-flake branch June 18, 2019 23:17
# prevents the queue from filling up, because the main thread
# has smaller gaps in time between calls to the queue's get
# method. For a detailed explaination, see:
# https://friendliness.dev/2019/06/18/python-nowait/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. 👍

@googleapis googleapis deleted a comment Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery: 'test_to_dataframe_w_bqstorage_nonempty' unit test flakes
4 participants