Skip to content

factorize fails for list of tuples #9454

Closed
@eyurtsev

Description

@eyurtsev

It's not clear from the documentation for factorize what datatype is expected for the values. But I assume that any list of hashables should work (specifically, a list of tuples).

Factorize indeed works for a list of tuples as long as the lens of all the tuples are not identical, but fails the moment all tuples have the same length. (Looks like there is some inference about the structure of the values that shouldn't be happening.)

import pandas as pd
pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2), 'nonsense']) # This works

(array([0, 1, 2, 1, 3]), array([(1, 1), (1, 2), (0, 0), 'nonsense'], dtype=object))

pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2), (1, 2, 3)]) # This also works.

pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2)]) # <-- fails
ValueError                                Traceback (most recent call last)
<ipython-input-22-3ca8ec02e16c> in <module>()
      1 print pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2), 'nonsense'])
----> 2 print pd.factorize([(1, 1), (1, 2), (0, 0), (1, 2)])

/usr/local/lib/python2.7/dist-packages/pandas/core/algorithms.pyc in factorize(values, sort, order, na_sentinel)
    132     table = hash_klass(len(vals))
    133     uniques = vec_klass()
--> 134     labels = table.get_labels(vals, uniques, 0, na_sentinel)
    135 
    136     labels = com._ensure_platform_int(labels)

/usr/local/lib/python2.7/dist-packages/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_labels (pandas/hashtable.c:8575)()

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

pandas 0.15.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugTestingpandas testing functions or related to the test suitegood first issue

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions