Preserve dtypes when get() is called with an index #346

frankenjoe · 2023-01-17T13:28:30Z

Closes #324

Solves issue described in #324 and adds a test for it.

codecov · 2023-01-17T13:40:27Z

Codecov Report

Merging #346 (4459232) into main (3901a51) will not change coverage.
The diff coverage is 100.0%.

Impacted Files	Coverage Δ
audformat/core/table.py	`100.0% <100.0%> (ø)`

audformat/core/table.py

hagenw · 2023-01-17T14:16:10Z

Interestingly, this does not solve #324 completely:

import audformat


db = audformat.Database('db')

db.schemes['spk'] = audformat.Scheme('str', labels=['a', 'b'])
index = audformat.filewise_index(['f1'])
db['files'] = audformat.Table(index)
db['files']['spk'] = audformat.Column(scheme_id='spk')
db['files']['spk'].set(['a'])

db.schemes['label'] = audformat.Scheme('int')
index = audformat.segmented_index(['f1', 'f1'], [0, 1], [1, 2])
db['segments'] = audformat.Table(index)
db['segments']['label'] = audformat.Column(scheme_id='label')
db['segments']['label'].set([0, 1])

and then:

>>> df = db['files'].get()
>>> df.spk.cat.categories
Index(['a', 'b'], dtype='object')
>>> df.loc['f1', 'spk'] = 'c'
...
TypeError: Cannot setitem on a Categorical with a new category (c), set the categories first
>>> df
     spk
file    
f1     a

but

>>> df = db['files'].get(db['segments'].index)
>>> df.spk.cat.categories
Index(['a', 'b'], dtype='object')
>>> df.loc['f1', 'spk'] = 'c'
>>> df
                                     spk
file start           end                
f1   0 days 00:00:00 0 days 00:00:01   c
     0 days 00:00:01 0 days 00:00:02   c
>>> df.spk.cat.categories
...
AttributeError: Can only use .cat accessor with a 'category' dtype

But maybe the difference in behavior is now attributed to the fact that the first is a filewise index whereas the second a segmented index?

frankenjoe · 2023-01-17T14:22:21Z

But maybe the difference in behavior is now attributed to the fact that the first is a filewise index whereas the second a segmented index?

Maybe a bug in pandas as in both cases the dtypes seem to be right:

df = db['files'].get()
df.spk.cat.categories

Index(['a', 'b'], dtype='object')

df = db['files'].get(db['segments'].index)
df.spk.cat.categories

Index(['a', 'b'], dtype='object')

So I guess there is not much we can do about it.

hagenw · 2023-01-17T14:29:18Z

I would then propose that we remove "Closes #324" here and update the issue after this is merged.
Afterwards we can then have a look if this is reported somewhere in pandas already.

frankenjoe · 2023-01-17T14:31:27Z

Ok, but then we should at least remove the 'bug' label from #324 as I don't see we can do anything else but make sure the categorical dtype is kept.

hagenw · 2023-01-17T14:54:18Z

OK, let's leave the "Close #324" here as CategoricalDtype is now indeed preserved. I will open a new issue after this here is merged.

TST: add failing test

d62698a

frankenjoe changed the title ~~TST: add failing test~~ WIP: Preserve dtypes when get() is called with an index Jan 17, 2023

frankenjoe added 2 commits January 17, 2023 14:37

apply fix

5b4578a

rename test to test_get_preserve_dtypes()

7d680ba

TST: assert get() also perserves dtypes

4459232

frankenjoe changed the title ~~WIP: Preserve dtypes when get() is called with an index~~ Preserve dtypes when get() is called with an index Jan 17, 2023

frankenjoe requested a review from hagenw January 17, 2023 13:56

hagenw reviewed Jan 17, 2023

View reviewed changes

audformat/core/table.py Show resolved Hide resolved

hagenw merged commit 2efbc43 into main Jan 17, 2023

hagenw deleted the fix-get-index-dtype branch January 17, 2023 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve dtypes when get() is called with an index #346

Preserve dtypes when get() is called with an index #346

Uh oh!

frankenjoe commented Jan 17, 2023 •

edited

Loading

Uh oh!

codecov bot commented Jan 17, 2023 •

edited

Loading

Uh oh!

Uh oh!

hagenw commented Jan 17, 2023 •

edited

Loading

Uh oh!

frankenjoe commented Jan 17, 2023 •

edited

Loading

Uh oh!

hagenw commented Jan 17, 2023

Uh oh!

frankenjoe commented Jan 17, 2023

Uh oh!

hagenw commented Jan 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Preserve dtypes when get() is called with an index #346

Preserve dtypes when get() is called with an index #346

Uh oh!

Conversation

frankenjoe commented Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

hagenw commented Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frankenjoe commented Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hagenw commented Jan 17, 2023

Uh oh!

frankenjoe commented Jan 17, 2023

Uh oh!

hagenw commented Jan 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

frankenjoe commented Jan 17, 2023 •

edited

Loading

codecov bot commented Jan 17, 2023 •

edited

Loading

hagenw commented Jan 17, 2023 •

edited

Loading

frankenjoe commented Jan 17, 2023 •

edited

Loading