Skip to content

CategoricalDtype not preserved when using index in Table.get() #324

@hagenw

Description

@hagenw

When indexing a table that contains a scheme with labels with an index when getting it with get(index) we are loosing the information about allowed categories as it returns no longer a CategoricalDtype object. This can also break your code as the column can no longer be accessed via .cat which is possible without using an index.

This is related to #317 as we also loose this information when using map.

import audformat


db = audformat.Database('db')

db.schemes['spk'] = audformat.Scheme('str', labels=['a', 'b'])
index = audformat.filewise_index(['f1'])
db['files'] = audformat.Table(index)
db['files']['spk'] = audformat.Column(scheme_id='spk')
db['files']['spk'].set(['a'])

db.schemes['label'] = audformat.Scheme('int')
index = audformat.segmented_index(['f1', 'f1'], [0, 1], [1, 2])
db['segments'] = audformat.Table(index)
db['segments']['label'] = audformat.Column(scheme_id='label')
db['segments']['label'].set([0, 1])

Now we can do the following:

>>> df = db['files'].get()
>>> df.spk.cat.categories
Index(['a', 'b'], dtype='object')
>>> df.loc['f1', 'spk'] = 'c'
...
TypeError: Cannot setitem on a Categorical with a new category (c), set the categories first

But, when indexing by the other table we loose all this information:

>>> df = db['files'].get(db['segments'].index)
>>> df.spk.cat.categories
...
AttributeError: Can only use .cat accessor with a 'category' dtype
>>> df.loc['f1', 'spk'] = 'c'
>>> df
                                     spk
file start           end                
f1   0 days 00:00:00 0 days 00:00:01   c
     0 days 00:00:01 0 days 00:00:02   c

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions