-
Notifications
You must be signed in to change notification settings - Fork 104
Fix: Add compat code for pd.Categorical in pandas>=0.15 #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@josef-pkt this was the only place I found in statsmodels and patsy, where |
@JanSchulz The test error for statsmodels is in grouputils of statsmodels |
Thanks!
|
rebased, fixed comma and repushed |
There are probably other places, where it is good to check for categoricals, but if this is not checked a Series of type categorical should behave like a string/int/... Series apart from sorting behaviour, min/max on unordered cats and if you try to change values to something which is not in levels. I'll add this to the
|
Okay, right -- a categorical containing ints should not be treated like That change can't possibly be correct because you'll need to touch the On Thu, Aug 14, 2014 at 3:31 PM, Jan Schulz notifications@github.com
Nathaniel J. Smith |
BTW: this is used in Categorical to get ints for levels:
I'm not sure how many users patsy has, which do not use pandas... |
(The travis build failure is just my current battles with travis and python 3.2, see mailing list, nothing to do with the patch.) |
current patch, just waiting on the update whether to box the codes or not
|
NB you have an = where you mean ==. Is data.cat a Categorical object? If so then I think it would reduce code if getattr(data, "dtype", None) =="category": On Thu, Aug 14, 2014 at 3:39 PM, Jan Schulz notifications@github.com
Nathaniel J. Smith |
Nope, data.cat is the accessor (only shows the public API), but |
I added the code, but I kept the current way to check, as it basically ended up with the same number of checks. |
@@ -173,6 +173,10 @@ def sniff(self, data): | |||
# second-guess it. | |||
self._levels = tuple(data.levels) | |||
return True | |||
if hasattr(data, "dtype") and data.dtype == "category": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JanSchulz you might want to define something like: is_pandas_cat_support = LooseVersion(pd.__version__) >= 0.15.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patsy (and statsmodels) do not unconditionally import pandas, so this would need 2 additionally checks (has_pandas, the version check, and then still if it is a data with dtype and if that dtype is categorical...
redone... |
return data.codes | ||
else: | ||
return data.labels | ||
if hasattr(data, "dtype") and data.dtype == "category": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return getattr(data,'codes',data.labels)
a bit more concise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that will fail when data.labels is not there(so in current pandas master and after the deprecation period), as the data.labels
is evaluated before calling getattr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only alternative is
try:
return data.codes
except:
return data.labels
[Thats the variant which is not in statsmodels]
But I like explicit checks and not the try-except variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm you r right
nothing special about getattr wrt to short circuiting
I just got bitten by this after returning to an older project, having updated Pandas in the meantime. Any chance to get this merged soonish? Is it possible to help? |
So just to be clear, I'm not supposed to be getting this "datatype not understood" exception, am I? It only happens when I cast things to code.py
With this setup, I either get:
or
surgery.csv:
Versions:
|
This probably needs some more work: labels -> codes and levels -> categories... Shoudl I redo the PR or what's the status? |
Sorry I lost track of this! (Just moved from Edinburgh to Berkeley and have been dropping a lot of balls :-(.) At this point I am completely lost and confused regarding pandas's categorical changes. @JanSchulz: do you know what to do? Please send help! |
pandas renamed pd.Categorical.labels to pd.Categorical.codes and pd.Categorical.levels to pd.Categorical.categories. The 'codes and level' constructor was also removed in favour of the 'values and categories' one, so use the Categories.from_codes(...). It's also now possible to have Categoricals as blocks, so Series can contain Categoricals. Unfortunately numpy.dtypes do not compare to 'category', so use the pandas function 'is_categorical_dtype' for this. Added FIXMEs to all sections which should can be removed when patsy only supports pandas >0.15.
Updated, fixed all nosetest errors in the file, let's see if travis sees some more... |
The above error is probably still present :-( |
Changes Unknown when pulling e636ef0 on JanSchulz:cat_fixes into * on pydata:master*. |
1 similar comment
Changes Unknown when pulling e636ef0 on JanSchulz:cat_fixes into * on pydata:master*. |
Changes Unknown when pulling e636ef0 on JanSchulz:cat_fixes into * on pydata:master*. |
I opend a issue in pandas repo regarding the above problem with this
I haven't looked into the first problem (python3) |
Those interested in this PR should check out #59, which I think (hope?) replaces it. |
close, as #59 replaces this? |
Closing, as this in in #59 |
pandas renamed pd.Categorical.labels to pd.Categorical.codes. It's
also now possible to have Categoricals as blocks, so Series can contain
Categoricals.