Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: to_gbq uses default_type for ambiguous array types and struct field types #838

Merged
merged 20 commits into from
Dec 19, 2024

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Dec 12, 2024

TODO:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • [n/a] Appropriate docs were updated (if necessary)
  • [n/a] Include fix to not detect schema if one is provided (or merge with the provided one if need be). N/A. There is merge logic so that users can provide a subset of the schema. I don't think it makes sense to modify that logic in this PR.

Fixes #836 🦕

@tswast tswast requested review from a team as code owners December 12, 2024 21:59
@tswast tswast requested a review from chalmerlowe December 12, 2024 21:59
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Dec 12, 2024
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Dec 12, 2024
@tswast
Copy link
Collaborator Author

tswast commented Dec 13, 2024

Tested with customer's code samples manually and confirmed this fixes the issue:

import pandas_gbq
import pandas as pd
import numpy as np

DESTINATION_TABLE_ID = 'INSERT_YOUR_TABLE_HERE'

schema = [
 {'name': 'Id', 'type': 'INTEGER', 'mode': 'NULLABLE'},
 {'name': 'Positions',
  'type': 'RECORD',
  'mode': 'REPEATED',
  'fields': [
   {'name': 'PositionState',
    'type': 'BYTES',  # Pick something other than STRING to make sure merge of schemas works
    'mode': 'NULLABLE'}
  ]
}
]

works_df = pd.DataFrame([{
        'Id': 123,
        'Positions': None
}])

error_df = pd.DataFrame([{
        'Id': 123,
        'Positions': np.array([{
            'PositionState': None
        }])
}])

# Works with warning
# pandas_gbq.to_gbq(works_df, destination_table=DESTINATION_TABLE_ID, table_schema=schema, if_exists='replace')

# Throws error
pandas_gbq.to_gbq(error_df, destination_table=DESTINATION_TABLE_ID, table_schema=schema, if_exists='replace')

@leahecole leahecole assigned chalmerlowe and unassigned leahecole Dec 16, 2024
Copy link
Collaborator

@chalmerlowe chalmerlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please accept and/or add docstrings.
  • Please consider the other suggestions for inclusion, but not blockers.

Approving this PR based on addition of docstrings.

pandas_gbq/schema/pandas_to_bigquery.py Outdated Show resolved Hide resolved
pandas_gbq/schema/pandas_to_bigquery.py Outdated Show resolved Hide resolved
pandas_gbq/schema/pandas_to_bigquery.py Show resolved Hide resolved
pandas_gbq/schema/pandas_to_bigquery.py Show resolved Hide resolved
pandas_gbq/schema/pyarrow_to_bigquery.py Show resolved Hide resolved
tswast and others added 2 commits December 19, 2024 10:26
Co-authored-by: Chalmer Lowe <chalmerlowe@google.com>
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Dec 19, 2024
@tswast tswast enabled auto-merge (squash) December 19, 2024 17:09
@tswast tswast force-pushed the issue836-to_gbq-with-schema branch from 03f7128 to 7e23e74 Compare December 19, 2024 17:14
@tswast tswast merged commit cf1aadd into main Dec 19, 2024
22 of 25 checks passed
@tswast tswast deleted the issue836-to_gbq-with-schema branch December 19, 2024 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AttributeError: 'NoneType' object has no attribute 'to_api_repr'
3 participants