Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use temporary file in load_table_from_dataframe #7545

Merged
merged 3 commits into from
Mar 25, 2019

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Mar 22, 2019

This fixes a bug where load_table_from_dataframe could not be used
with the fastparquet library. It should also use less memory when
uploading large dataframes.

Fixes #7543

This fixes a bug where `load_table_from_dataframe` could not be used
with the `fastparquet` library. It should also use less memory when
uploading large dataframes.
@tswast tswast requested review from tseaver and alixhami March 22, 2019 16:36
@tswast tswast requested a review from crwilcox as a code owner March 22, 2019 16:36
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Mar 22, 2019
@alixhami
Copy link
Contributor

It might be a good idea to have tests that explicitly set the parquet engine to pyarrow and fastparquet individually, since it sounds like users regularly use both.

@tswast
Copy link
Contributor Author

tswast commented Mar 22, 2019

It might be a good idea to have tests that explicitly set the parquet engine to pyarrow and fastparquet individually, since it sounds like users regularly use both.

@alixhami Great idea. Done in 4bca63f.

@alixhami
Copy link
Contributor

It looks like there's an error when building snappy

@tswast
Copy link
Contributor Author

tswast commented Mar 22, 2019

I think I need to add an OS-level package (libsnappy-dev according to StackOverflow) to the trampoline image used for the google-cloud-python BigQuery tests. I see from the Kokoro configs that the image is gcr.io/cloud-devrel-kokoro-resources/python-multi. I've reached out to the Yoshi folks to see how to add to that image or use it as a base for a BigQuery-specific one.

@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 23, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 23, 2019
@tswast
Copy link
Contributor Author

tswast commented Mar 23, 2019

Trying a rebuild now that I we have snappy in the Kokoro Dockerfile (googleapis/testing-infra-docker#7).

@tswast
Copy link
Contributor Author

tswast commented Mar 23, 2019

api_core test failure (pytype session) appears unrelated to this change.

@tswast tswast requested a review from shollyman March 25, 2019 15:53
@tswast
Copy link
Contributor Author

tswast commented Mar 25, 2019

The BigQuery tests do pass now.

@tswast tswast merged commit 675dbd8 into googleapis:master Mar 25, 2019
@tswast tswast deleted the issue7543-load-table-dataframe branch March 25, 2019 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery: load_table_from_dataframe should use a temporary file
4 participants