Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Allow specifying index data type in partial schema to load_table_from_dataframe. #9084

Merged
merged 10 commits into from
Aug 28, 2019

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Aug 23, 2019

Closes #5572.

If an index (or level of a multi-index) has a name and is present in the
schema passed to load_table_from_dataframe, then that index will be
serialized and written to the table. Otherwise, the index is omitted
from the serialized table.

Remaining items:

  • Update unit tests to account for new ValueErrors for missing / extra columns.
  • Fix tests for moved sample samples/load_table_dataframe.py.
  • Add unit tests to account for new index behavior, especially some tests with MultIndex DataFrames.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Aug 23, 2019
@tswast tswast added the api: bigquery Issues related to the BigQuery API. label Aug 23, 2019
@tswast
Copy link
Contributor Author

tswast commented Aug 23, 2019

Thought: if an index column is requested, but we end up wanting to return a schema of None, that's an error, because the requested index column might not be written or it might not be written with the correct data type.

…m_dataframe`.

If an index (or level of a multi-index) has a name and is present in the
schema passed to `load_table_from_dataframe`, then that index will be
serialized and written to the table. Otherwise, the index is omitted
from the serialized table.
@tswast tswast force-pushed the issue5572-load-dataframe-indexes branch from b04a3c6 to 14e6baa Compare August 26, 2019 18:23
@tswast tswast marked this pull request as ready for review August 27, 2019 17:59
@tswast tswast requested review from a team and plamut August 27, 2019 17:59
@tswast
Copy link
Contributor Author

tswast commented Aug 27, 2019

Counter-thought: We'll already display a deprecation warning when we have to fallback to automatic schema detection via to_parquet. Also, if they are using pyarrow, then we know the index will get written to the parquet file.

Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, but spotted a few things that are worth checking again IMO.

bigquery/google/cloud/bigquery/_pandas_helpers.py Outdated Show resolved Hide resolved
bigquery/google/cloud/bigquery/_pandas_helpers.py Outdated Show resolved Hide resolved
bigquery/tests/unit/test__pandas_helpers.py Outdated Show resolved Hide resolved
bigquery/tests/unit/test__pandas_helpers.py Outdated Show resolved Hide resolved
bigquery/tests/unit/test__pandas_helpers.py Show resolved Hide resolved
@plamut
Copy link
Contributor

plamut commented Aug 28, 2019

@tswast Please just blacken test__pandas_helpers.py, the lint check complains.

Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks for the quick updates!

@tswast tswast merged commit a6ed945 into googleapis:master Aug 28, 2019
@tswast tswast deleted the issue5572-load-dataframe-indexes branch August 28, 2019 19:36
@tswast tswast mentioned this pull request Aug 28, 2019
HemangChothani pushed a commit to HemangChothani/google-cloud-python that referenced this pull request Aug 29, 2019
…rame`. (googleapis#9084)

* Specify the index data type in partial schema to `load_table_from_dataframe` to include it.

If an index (or level of a multi-index) has a name and is present in the
schema passed to `load_table_from_dataframe`, then that index will be
serialized and written to the table. Otherwise, the index is omitted
from the serialized table.

* Don't include index if has same name as column name.

* Move `load_table_dataframe` sample from `snippets.py` to `samples/`.

Sample now demonstrates how to manually include the index with a
partial schema definition. Update docs reference to new
`load_table_dataframe` sample location.
emar-kar pushed a commit to MaxxleLLC/google-cloud-python that referenced this pull request Sep 18, 2019
…rame`. (googleapis#9084)

* Specify the index data type in partial schema to `load_table_from_dataframe` to include it.

If an index (or level of a multi-index) has a name and is present in the
schema passed to `load_table_from_dataframe`, then that index will be
serialized and written to the table. Otherwise, the index is omitted
from the serialized table.

* Don't include index if has same name as column name.

* Move `load_table_dataframe` sample from `snippets.py` to `samples/`.

Sample now demonstrates how to manually include the index with a
partial schema definition. Update docs reference to new
`load_table_dataframe` sample location.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery: Load to table from dataframe without index
3 participants