Unable to load table from dataframe with overlapping index/column name

After [this](https://github.com/googleapis/python-bigquery/pull/1535/files#diff-7b2c585218162242a1b9cc0040bebfdb8d405becd35412cc7097cd4e49ef4c74R484) change in #1535, loading a dataframe where the index is also a column now fails:
```
[ins] In [42]: df
Out[42]:
   a
a
A  A
B  B

[ins] In [43]: bigquery.Client().load_table_from_dataframe(df, "tmp.blah")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [43], in <cell line: 1>()
----> 1 bigquery.Client().load_table_from_dataframe(df, "tmp.blah")
...
File ~/model/.venv/lib/python3.10/site-packages/google/cloud/bigquery/_pandas_helpers.py:484, in dataframe_to_bq_schema(dataframe, bq_schema)
    482 bq_type = _PANDAS_DTYPE_TO_BQ.get(dtype.name)
    483 if bq_type is None:
--> 484     sample_data = _first_valid(dataframe.reset_index()[column])
    485     if (
    486         isinstance(sample_data, _BaseGeometry)
    487         and sample_data is not None  # Paranoia
    488     ):
    489         bq_type = "GEOGRAPHY"
...
File ~/model/.venv/lib/python3.10/site-packages/pandas/core/frame.py:4440, in DataFrame.insert(self, loc, column, value, allow_duplicates)
   4434     raise ValueError(
   4435         "Cannot specify 'allow_duplicates=True' when "
   4436         "'self.flags.allows_duplicate_labels' is False."
   4437     )
   4438 if not allow_duplicates and column in self.columns:
   4439     # Should this be a different kind of error??
-> 4440     raise ValueError(f"cannot insert {column}, already exists")
   4441 if not isinstance(loc, int):
   4442     raise TypeError("loc must be int")

ValueError: cannot insert a, already exists
```

Kind of a weird edge case but I think the same goal of that PR could have been accomplished without a breaking change. Perhaps the easiest would be to just reset_index() in a separate statement and catch the ValueError (since if you hit it then the reset_index() call wasn't needed)?

cc @tswast @chelsea-lin 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to load table from dataframe with overlapping index/column name #1543

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to load table from dataframe with overlapping index/column name #1543

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions