Closed
Description
It's not clear from the documentation for factorize what datatype is expected for the values. But I assume that any list of hashables should work (specifically, a list of tuples).
Factorize indeed works for a list of tuples as long as the lens of all the tuples are not identical, but fails the moment all tuples have the same length. (Looks like there is some inference about the structure of the values that shouldn't be happening.)
import pandas as pd
pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2), 'nonsense']) # This works
(array([0, 1, 2, 1, 3]), array([(1, 1), (1, 2), (0, 0), 'nonsense'], dtype=object))
pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2), (1, 2, 3)]) # This also works.
pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2)]) # <-- fails
ValueError Traceback (most recent call last)
<ipython-input-22-3ca8ec02e16c> in <module>()
1 print pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2), 'nonsense'])
----> 2 print pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2)])
/usr/local/lib/python2.7/dist-packages/pandas/core/algorithms.pyc in factorize(values, sort, order, na_sentinel)
132 table = hash_klass(len(vals))
133 uniques = vec_klass()
--> 134 labels = table.get_labels(vals, uniques, 0, na_sentinel)
135
136 labels = com._ensure_platform_int(labels)
/usr/local/lib/python2.7/dist-packages/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_labels (pandas/hashtable.c:8575)()
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
pandas 0.15.2