Table.from_pandas sets empty string columns to null type

## GitHub Issues for Apache Arrow
The issues seems to be that `pyarrow.Table.from_pandas` will set string (object) columns to null type if the dataframe is empty.

````
df = pd.DataFrame({'a':[],'b':[],'c':[]}, dtype=object)  
df['b'] = df['b'].astype(np.int32)
df['c'] = pd.to_datetime(df['c'])
df.dtypes

>> a            object
>> b             int32
>> c    datetime64[ns]
>> dtype: object
````
The pyarrow schema is then of null type. Other types (numeric and datetimes) seem to work as expected.
````
table = pa.Table.from_pandas(tdf, preserve_index=False)
table.schema

>> a: null
>> b: int32
>> c: timestamp[ns]
>> metadata
>> --------
>> {b'pandas': b'{"index_columns": [], "column_indexes": [], "columns": [{"name":'
>>             b' "a", "field_name": "a", "pandas_type": "empty", "numpy_type": "'
>>             b'object", "metadata": null}, {"name": "b", "field_name": "b", "pa'
>>             b'ndas_type": "int32", "numpy_type": "int32", "metadata": null}, {'
>>             b'"name": "c", "field_name": "c", "pandas_type": "datetime", "nump'
>>             b'y_type": "datetime64[ns]", "metadata": null}], "pandas_version":'
>>            b' "0.23.0"}'}
````
You can then modify that particular field to be a `pyarrow.string()`  type.

````
t2 = pa.string()
fields = [pa.field('a', t2)]
s=pa.schema(fields)
table = pa.Table.from_pandas(tdf, schema=s, preserve_index=False)
table.schema

>> a: string
>> b: int32
>> c: timestamp[ns]
>> metadata
>> --------
>> {b'pandas': b'{"index_columns": [], "column_indexes": [], "columns": [{"name":'
>>            b' "a", "field_name": "a", "pandas_type": "unicode", "numpy_type":'
>>            b' "object", "metadata": null}, {"name": "b", "field_name": "b", "'
>>            b'pandas_type": "int32", "numpy_type": "int32", "metadata": null},'
>>            b' {"name": "c", "field_name": "c", "pandas_type": "datetime", "nu'
>>            b'mpy_type": "datetime64[ns]", "metadata": null}], "pandas_version'
>>            b'": "0.23.0"}'}
````

This seems to affect only empty dataframes.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Table.from_pandas sets empty string columns to null type #2110

GitHub Issues for Apache Arrow

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Table.from_pandas sets empty string columns to null type #2110

Description

GitHub Issues for Apache Arrow

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions