-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: pivot_table with margins=True fails for categorical dtype, #10989 #10993
Conversation
# here we'll convert all categorical indices to object | ||
def convert_categorical(ind): | ||
_convert = lambda ind: (ind.astype('object') | ||
if ind.dtype.name == 'category' else ind) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not quite right, this should end of being a cat level.
can you add the tests case and i'll take a look. thx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case is in the linked issue. Or would you like me to add a unit test to the package as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes pls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too specific
need to be more general and/or pushed down into the index itself
I should add one thing: I decided that rather than adding a new item to the category, it would be cleaner to simply change categories to objects rather than trying to track down all the corner cases of hierarchical indices with categorical levels. |
A much cleaner solution, IMO, would be to add a utility function that does something along the lines of "add a new entry to this index, even if it requires a new category". I imagine there are other places in the package where this sort of thing might happen, and such a utility could be used there as well. |
I think we should return a regular Index with object dtypes. What happens if the user has a |
@TomAugspurger – I checked – it turns out this is another bug in the current codebase, even if you're not using categories! If one of the index entries is called In [19]: data = pd.DataFrame({'x': np.arange(99),
'y': np.arange(99) // 50,
'z': np.arange(99) % 3})
In [20]: data.z = np.array(['Any', 'All', 'None'])[data.z]
In [21]: data.pivot_table('x', 'y', 'z')
Out[21]:
z All Any None
y
0 25.0 24.0 24.5
1 74.5 73.5 74.0
In [22]: data.pivot_table('x', 'y', 'z', margins=True)
Out[22]:
z All Any None
y
0 24.5 24.0 24.5
1 74.0 73.5 74.0
All 49.0 48.0 50.0 |
@jakevdp can you update according to comments |
I think I've already addressed all comments above – any others you have in mind? Any reason you closed this without merging? |
I thought I put the comments before this needs a more general soln as it too much if/then on type determination |
I guess I'm not entirely clear about what you're wanting as a "more general" solution. Any specific ideas? |
So there is something more going on here; this bug report is a sympton of a different issue. Namely,
So this is the same as the index in [10]
So I think that |
I suppose we should close this PR then, and leave the issue open. Hacking into the internals of MultiIndex is well beyond my level of comfort with the library. |
haha. well, I'll take your tests in any event. So going to leave open for a bit. |
Sounds good – thanks! |
BUG: pivot table bug with Categorical indexes, #10993
This is a fix for the issue reported in #10989. I suspect this is an example of "fixing the symptom" rather than "fixing the problem", but I think it makes clear what the source of the problem is: to compute margins, the pivot table must add a row and/or column to the result. If the index or column is categorical, a new value cannot be added.
Let me know if you think there are better approaches to this.