Skip to content

to_gbq: Allow creation of new tables from DataFrame (and generate schema) #8325

Closed
@jtratner

Description

@jtratner

Small extension on top the to_gbq so that you can actually create new tables given only an existing dataframe. Given an arbitrary DataFrame with a non hierarchical-index, create a schema from it. For now, we'd likely assume that object dtype columns are string and maybe allow for specifying some or all columns for the schema so that int columns with nulls come out correctly (otherwise, they'd be coerced to float columns b/c of nan stuff).

E.g.:

In [6]: import pandas as pd

In [7]: import pandas.util.testing as testing

In [8]: df = testing.makeMixedDataFrame()

In [9]: df
Out[9]:
   A  B     C          D
0  0  0  foo1 2009-01-01
1  1  1  foo2 2009-01-02
2  2  0  foo3 2009-01-05
3  3  1  foo4 2009-01-06
4  4  0  foo5 2009-01-07

In [10]: df.dtypes
Out[10]:
A           float64
B           float64
C            object
D    datetime64[ns]
dtype: object

Then you could do something like:

In [11]: generate_bq_schema(df)
Out[11]:
{'fields': [{'name': 'A', 'type': 'FLOAT'},
  {'name': 'B', 'type': 'FLOAT'},
  {'name': 'C', 'type': 'STRING'},
  {'name': 'D', 'type': 'TIMESTAMP'}]}

and with a named index, that could be added to the schema as well. For now, we could stick to requiring non-hierarchical/MultiIndex, but maybe we could use record types for an index that's MultiIndex in the future?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions