Skip to content

Unable to load table from dataframe with overlapping index/column name #1543

Open
@bnaul

Description

@bnaul

After this change in #1535, loading a dataframe where the index is also a column now fails:

[ins] In [42]: df
Out[42]:
   a
a
A  A
B  B

[ins] In [43]: bigquery.Client().load_table_from_dataframe(df, "tmp.blah")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [43], in <cell line: 1>()
----> 1 bigquery.Client().load_table_from_dataframe(df, "tmp.blah")
...
File ~/model/.venv/lib/python3.10/site-packages/google/cloud/bigquery/_pandas_helpers.py:484, in dataframe_to_bq_schema(dataframe, bq_schema)
    482 bq_type = _PANDAS_DTYPE_TO_BQ.get(dtype.name)
    483 if bq_type is None:
--> 484     sample_data = _first_valid(dataframe.reset_index()[column])
    485     if (
    486         isinstance(sample_data, _BaseGeometry)
    487         and sample_data is not None  # Paranoia
    488     ):
    489         bq_type = "GEOGRAPHY"
...
File ~/model/.venv/lib/python3.10/site-packages/pandas/core/frame.py:4440, in DataFrame.insert(self, loc, column, value, allow_duplicates)
   4434     raise ValueError(
   4435         "Cannot specify 'allow_duplicates=True' when "
   4436         "'self.flags.allows_duplicate_labels' is False."
   4437     )
   4438 if not allow_duplicates and column in self.columns:
   4439     # Should this be a different kind of error??
-> 4440     raise ValueError(f"cannot insert {column}, already exists")
   4441 if not isinstance(loc, int):
   4442     raise TypeError("loc must be int")

ValueError: cannot insert a, already exists

Kind of a weird edge case but I think the same goal of that PR could have been accomplished without a breaking change. Perhaps the easiest would be to just reset_index() in a separate statement and catch the ValueError (since if you hit it then the reset_index() call wasn't needed)?

cc @tswast @chelsea-lin

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions