Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pathlib
df = pd.DataFrame({'A':[1,2,3,4], 'B':'C'})
df.to_parquet('tmp_path1.parquet') # OK
df.to_parquet(pathlib.Path('tmp_path2.parquet')) # OK
df.to_parquet('tmp_path3.parquet', partition_cols=['B']) # OK
df.to_parquet(pathlib.Path('tmp_path4.parquet'), partition_cols=['B']) # TypeError
Problem description
to_parquet
method raises TypeError when using pathlib.Path()
as an argument in case when partition_cols
argument is not None. If no partition cols are provided, then pathlib.Path()
is properly accepted
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-53-cae5a944d982> in <module>
3
4 df.to_parquet('tmp_path3.parquet', partition_cols=['B']) # OK
----> 5 df.to_parquet(pathlib.Path('tmp_path4.parquet'), partition_cols=['B']) # TypeError
~/miniconda3/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
197 else:
198 kwargs[new_arg_name] = new_arg_value
--> 199 return func(*args, **kwargs)
200
201 return cast(F, wrapper)
~/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py in to_parquet(self, path, engine, compression, index, partition_cols, **kwargs)
2370 index=index,
2371 partition_cols=partition_cols,
-> 2372 **kwargs,
2373 )
2374
~/miniconda3/lib/python3.7/site-packages/pandas/io/parquet.py in to_parquet(df, path, engine, compression, index, partition_cols, **kwargs)
274 index=index,
275 partition_cols=partition_cols,
--> 276 **kwargs,
277 )
278
~/miniconda3/lib/python3.7/site-packages/pandas/io/parquet.py in write(self, df, path, compression, index, partition_cols, **kwargs)
117 compression=compression,
118 partition_cols=partition_cols,
--> 119 **kwargs,
120 )
121 else:
~/miniconda3/lib/python3.7/site-packages/pyarrow/parquet.py in write_to_dataset(table, root_path, partition_cols, partition_filename_cb, filesystem, **kwargs)
1790 subtable = pa.Table.from_pandas(subgroup, schema=subschema,
1791 safe=False)
-> 1792 _mkdir_if_not_exists(fs, '/'.join([root_path, subdir]))
1793 if partition_filename_cb:
1794 outfile = partition_filename_cb(keys)
TypeError: sequence item 0: expected str instance, PosixPath found
Output of pd.show_versions()
INSTALLED VERSIONS
commit : f2ca0a2
python : 3.7.1.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
Version : Darwin Kernel Version 18.7.0: Thu Jun 18 20:50:10 PDT 2020; root:xnu-4903.278.43~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.1
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 20.2.2
setuptools : 42.0.1.post20191125
Cython : None
pytest : 5.3.0
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : 1.1
pymysql : 0.9.3
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.13.0
pandas_datareader: 0.9.0
bs4 : 4.6.3
bottleneck : None
fsspec : 0.6.0
fastparquet : 0.3.2
gcsfs : None
matplotlib : 3.3.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.1
sqlalchemy : 1.3.13
tables : 3.4.4
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.1.0
xlwt : None
numba : 0.46.0