Skip to content

load_table_toDataframe breaks with Arrow list fields when the list is backed by a ChunkedArray. #1808

Closed
@cvm-a

Description

@cvm-a

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

  • OS type and version: MacOS Darwin Kernel Version 22.1.0
  • Python version: 3.11
  • pip version: 23.2.1
  • google-cloud-bigquery version: 3.15.0

Steps to reproduce

  1. Create an Arrow backed dataframe with a large list field.
  2. Create a google.cloud.bigquery Client
  3. call Client.load_table_from_dataframe on this dataframe

Code example

# example
import pandas as pd
import pyarrow as pa
from google.cloud import bigquery as gbq

client= gbq.Client(
            project=<project_id>,
            credentials=<credentials>,
            location=<location>,
        )
df = pd.DataFrame({"x":pa.array(pd.Series([[2.2]*5]*10000000)).to_pandas(types_mapper=pd.ArrowDtype)})
client.load_table_from_dataframe(
            df, 'temporary_tables.chunked_array_error')

Stack trace

  File "/Users/<redacted>", line 250, in create_table_from_dataframe
    load_job = self.client.load_table_from_dataframe(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<redacted>/lib/python3.11/site-packages/google/cloud/bigquery/client.py", line 2671, in load_table_from_dataframe
    new_job_config.schema = _pandas_helpers.dataframe_to_bq_schema(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<redacted>/lib/python3.11/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 465, in dataframe_to_bq_schema
    bq_schema_out = augment_schema(dataframe, bq_schema_out)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/<redacted>lib/python3.11/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 500, in augment_schema
    arrow_table.values.type.id
    ^^^^^^^^^^^^^^^^^^
AttributeError: 'pyarrow.lib.ChunkedArray' object has no attribute 'values'

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: bigqueryIssues related to the googleapis/python-bigquery API.priority: p2Moderately-important priority. Fix may not be included in next release.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions