Skip to content

BUG: Cannot create Categorial of mixed dtypes/tuples if first element is not a tuple #21416

Closed
@alysivji

Description

@alysivji

Code Sample, a copy-pastable example if possible

It's possible to create a Categorical of mixed dtypes, with at least one tuple, if the first element is a tuple.

s = pd.Categorical([('a', 'a'), ('a', 'b'), ('b', 'a'), 'c'])
s

Out[21]:
[(a, a), (a, b), (b, a), c]
Categories (4, object): [(a, a), (a, b), (b, a), c]

Does not work if first element is not a tuple.

s = pd.Categorical(['c', ('a', 'b'), ('b', 'a'), 'c'])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-0f4b5f338532> in <module>()
----> 1 s = pd.Categorical(['c', ('a', 'b'), ('b', 'a'), 'c'])

~/Documents/siv-dev/projects/open-source/pandas/pandas/core/arrays/categorical.py in __init__(self, values, categories, ordered, dtype, fastpath)
    328             # _sanitize_array coerces np.nan to a string under certain versions
    329             # of numpy
--> 330             values = maybe_infer_to_datetimelike(values, convert_dates=True)
    331             if not isinstance(values, np.ndarray):
    332                 values = _convert_to_list_like(values)

~/Documents/siv-dev/projects/open-source/pandas/pandas/core/dtypes/cast.py in maybe_infer_to_datetimelike(value, convert_dates)
    893     if not is_list_like(v):
    894         v = [v]
--> 895     v = np.array(v, copy=False)
    896
    897     # we only care about object dtypes

ValueError: setting an array element with a sequence

Problem description

While writing tests for the #20439 fix, I found that it's not possible create Categorical objects with mixed dtypes, with at least one tuple, if the first item is not a tuple.

Expected Output

Constructor should accept non-tuple first arguments and return a Categorical

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: 415012f python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+970.g415012f
pytest: 3.1.2
pip: 10.0.1
setuptools: 36.0.1
Cython: 0.27.3
numpy: 1.14.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.2
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: 0.1.0
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions