New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Possible performance issue when reading large datasets from BigQuery #66

Closed

parthea opened this issue Jul 3, 2017 · 1 comment

Contributor

parthea commented Jul 3, 2017 •

edited

Loading

An issue was reported on StackOverflow regarding an issue related to downloading 1,000,000 rows from BigQuery. See https://stackoverflow.com/questions/44868111/failed-to-import-large-data-as-dataframe-from-google-bigquery-to-google-cloud-d

Secondly I use:

data = pd.read_gbq(query='SELECT {ABOUT_30_COLUMNS...} FROM TABLE_NAME LIMIT 1000000', dialect ='standard', project_id='PROJECT_ID')

It runs well at first, but when it goes to about 450,000 rows (calculate using percentage and total row count), it gets stuck at:

Got page: 32; 45.0% done. Elapsed 293.1 s.

We have an integration test for test_download_dataset_larger_than_200k_rows https://github.com/pydata/pandas-gbq/blob/master/pandas_gbq/tests/test_gbq.py#L709 . It may be helpful to also include a performance test (and increase the dataset size).

The text was updated successfully, but these errors were encountered:

tswast mentioned this issue

Do not manually loop over all rows when reading a dataframe #97

Closed

tswast mentioned this issue

Performance #133

Closed

Contributor

max-sixty commented Aug 21, 2018

Closing in preference of #133 given the detail there

max-sixty closed this as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment