Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible performance issue when reading large datasets from BigQuery #66

Closed
parthea opened this issue Jul 3, 2017 · 1 comment
Closed

Comments

@parthea
Copy link
Contributor

parthea commented Jul 3, 2017

An issue was reported on StackOverflow regarding an issue related to downloading 1,000,000 rows from BigQuery. See https://stackoverflow.com/questions/44868111/failed-to-import-large-data-as-dataframe-from-google-bigquery-to-google-cloud-d

Secondly I use:

data = pd.read_gbq(query='SELECT {ABOUT_30_COLUMNS...} FROM TABLE_NAME LIMIT 1000000', dialect ='standard', project_id='PROJECT_ID')

It runs well at first, but when it goes to about 450,000 rows (calculate using percentage and total row count), it gets stuck at:

Got page: 32; 45.0% done. Elapsed 293.1 s.

We have an integration test for test_download_dataset_larger_than_200k_rows https://github.com/pydata/pandas-gbq/blob/master/pandas_gbq/tests/test_gbq.py#L709 . It may be helpful to also include a performance test (and increase the dataset size).

@max-sixty
Copy link
Contributor

Closing in preference of #133 given the detail there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants