-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes np.unique on SparseArray #19651
Conversation
# The fill value as well assuming the ngaps are greater than 0 | ||
|
||
if ngaps_val > 0: | ||
k = kh_get_{dtype}(self.table, fill_value_val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is duplicated 3 times in this class. Would be nice to de-dupe this somehow without making it super complicated with string interpolation...
@@ -268,6 +269,16 @@ def test_object_refcount_bug(self): | |||
for i in range(1000): | |||
len(algos.unique(lst)) | |||
|
|||
@pytest.mark.parametrize('fill_value', [0, 1, np.nan, None]) | |||
def test_sparse(self, fill_value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference issue number.
Codecov Report
@@ Coverage Diff @@
## master #19651 +/- ##
==========================================
+ Coverage 91.57% 91.59% +0.01%
==========================================
Files 150 150
Lines 48817 48811 -6
==========================================
+ Hits 44704 44707 +3
+ Misses 4113 4104 -9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason you are not just passing the sp_values
+ fill_value to and calling .unique
on that?
@@ -251,25 +251,36 @@ cdef class HashTable: | |||
{{py: | |||
|
|||
# name, dtype, null_condition, float_group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the comment
Hello @hexgnu! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on February 13, 2018 at 04:44 Hours UTC |
@@ -362,7 +363,14 @@ def unique(values): | |||
htable, _, values, dtype, ndtype = _get_hashtable_algo(values) | |||
|
|||
table = htable(len(values)) | |||
uniques = table.unique(values) | |||
|
|||
if isinstance(values, ABCSparseArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure you saw my comment. this should not be handling unique like this. SparseArray should have a method .unique()
which can directly handle the unique (and then call algorithms.unique on the sparse values).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IOW you might be able to get aways with doing a
if hasattr(values, 'unique'):
return values.unique()
...
need to avoid recursion, but here values must be a ndarray or an EA, including Categorical. (and NOT a Series)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm I'll have to think about this... I'm hesitant to do this mainly because this code seems to rely on the fact that it always outputs an ndarray, which is why groupby doesn't work with sparse data.
The problem really is with classes that implement their own unique(). It usually outputs something that isn't ndarray.
Also I don't want to call unique on the class and cast it back to ndarray cause that feels sloppy. I'll think of something :) thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hexgnu have you been following the ExtensionArray stuff? Eventually SparseArray
should be an ExtensionArray
like Categorical
. Eventually things like groupby
, factorize
, etc will all work correctly on EAs.
Also I don't want to call unique on the class and cast it back to ndarray cause that feels sloppy.
SparseArray.unique() should return a SparseArray, just like Categorical.unique returns a Categorical.
can you rebase / update |
@jreback : Based on this discussion here, doesn't this PR require a more substantial overhaul to properly anticipate the EA work from @TomAugspurger ? Or is your suggestion here an acceptable patch? |
Let's hold on this for a bit. I'm making SparseArray an ExtensionArray. |
Hey everybody here. Let me know what you want me to do. We can close this and I can reopen new work when the |
git diff upstream/master -u -- "*.py" | flake8 --diff