Skip to content

Conversation

@frankenjoe
Copy link
Collaborator

@frankenjoe frankenjoe commented Jan 17, 2023

Closes #324

Solves issue described in #324 and adds a test for it.

@frankenjoe frankenjoe changed the title TST: add failing test WIP: Preserve dtypes when get() is called with an index Jan 17, 2023
@codecov
Copy link

codecov bot commented Jan 17, 2023

Codecov Report

Merging #346 (4459232) into main (3901a51) will not change coverage.
The diff coverage is 100.0%.

Impacted Files Coverage Δ
audformat/core/table.py 100.0% <100.0%> (ø)

@frankenjoe frankenjoe changed the title WIP: Preserve dtypes when get() is called with an index Preserve dtypes when get() is called with an index Jan 17, 2023
@frankenjoe frankenjoe requested a review from hagenw January 17, 2023 13:56
@hagenw
Copy link
Member

hagenw commented Jan 17, 2023

Interestingly, this does not solve #324 completely:

import audformat


db = audformat.Database('db')

db.schemes['spk'] = audformat.Scheme('str', labels=['a', 'b'])
index = audformat.filewise_index(['f1'])
db['files'] = audformat.Table(index)
db['files']['spk'] = audformat.Column(scheme_id='spk')
db['files']['spk'].set(['a'])

db.schemes['label'] = audformat.Scheme('int')
index = audformat.segmented_index(['f1', 'f1'], [0, 1], [1, 2])
db['segments'] = audformat.Table(index)
db['segments']['label'] = audformat.Column(scheme_id='label')
db['segments']['label'].set([0, 1])

and then:

>>> df = db['files'].get()
>>> df.spk.cat.categories
Index(['a', 'b'], dtype='object')
>>> df.loc['f1', 'spk'] = 'c'
...
TypeError: Cannot setitem on a Categorical with a new category (c), set the categories first
>>> df
     spk
file    
f1     a

but

>>> df = db['files'].get(db['segments'].index)
>>> df.spk.cat.categories
Index(['a', 'b'], dtype='object')
>>> df.loc['f1', 'spk'] = 'c'
>>> df
                                     spk
file start           end                
f1   0 days 00:00:00 0 days 00:00:01   c
     0 days 00:00:01 0 days 00:00:02   c
>>> df.spk.cat.categories
...
AttributeError: Can only use .cat accessor with a 'category' dtype

But maybe the difference in behavior is now attributed to the fact that the first is a filewise index whereas the second a segmented index?

@frankenjoe
Copy link
Collaborator Author

frankenjoe commented Jan 17, 2023

But maybe the difference in behavior is now attributed to the fact that the first is a filewise index whereas the second a segmented index?

Maybe a bug in pandas as in both cases the dtypes seem to be right:

df = db['files'].get()
df.spk.cat.categories
Index(['a', 'b'], dtype='object')
df = db['files'].get(db['segments'].index)
df.spk.cat.categories
Index(['a', 'b'], dtype='object')

So I guess there is not much we can do about it.

@hagenw
Copy link
Member

hagenw commented Jan 17, 2023

I would then propose that we remove "Closes #324" here and update the issue after this is merged.
Afterwards we can then have a look if this is reported somewhere in pandas already.

@frankenjoe
Copy link
Collaborator Author

Ok, but then we should at least remove the 'bug' label from #324 as I don't see we can do anything else but make sure the categorical dtype is kept.

@hagenw
Copy link
Member

hagenw commented Jan 17, 2023

OK, let's leave the "Close #324" here as CategoricalDtype is now indeed preserved. I will open a new issue after this here is merged.

@hagenw hagenw merged commit 2efbc43 into main Jan 17, 2023
@hagenw hagenw deleted the fix-get-index-dtype branch January 17, 2023 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CategoricalDtype not preserved when using index in Table.get()

3 participants