Skip to content

PERF: Cythonize Groupby Rank #19481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Feb 10, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
396f1b6
Initial working rank with no tiebreaker
WillAyd Jan 29, 2018
c2c2177
Allowed kwargs to pass through to Cython func
WillAyd Jan 31, 2018
529503f
Comprehensive tests for all groupby rank args
WillAyd Jan 31, 2018
c7faa3b
Working avg tiebreak with nan handling
WillAyd Feb 1, 2018
baeb192
Added remaining tiebreakers; fixed int/float dtype mixup
WillAyd Feb 1, 2018
07c8e0f
Added func for obj support
WillAyd Feb 1, 2018
2ba6643
Added pct support
WillAyd Feb 1, 2018
4e54aa5
Added support for sorting
WillAyd Feb 1, 2018
428d32c
Working tests (excl missing data)
WillAyd Feb 1, 2018
902ef3c
Added Timestamps to tests
WillAyd Feb 1, 2018
ecd4b51
Working rank with numeric and missing
WillAyd Feb 5, 2018
e17433d
Added missing obj support
WillAyd Feb 5, 2018
b0ea557
Added support for timestamps mixed with nan
WillAyd Feb 5, 2018
e15b4b2
Added tests for multiple groups
WillAyd Feb 5, 2018
04eb4f1
Fixed bug with First tiebreak across multiple groups
WillAyd Feb 5, 2018
7a4602d
Variable Name Cleanup
WillAyd Feb 5, 2018
7be3bf3
Converted kwargs to positional arguments in Cython layer
WillAyd Feb 6, 2018
ca28350
Lint fixes
WillAyd Feb 6, 2018
913ce94
Created enum for rank tiebreakers
WillAyd Feb 6, 2018
4755941
Fixed build errors; Py <3.5 support
WillAyd Feb 6, 2018
d4a6662
LINT fixes
WillAyd Feb 6, 2018
56e7974
Fixed isnan reference issue on Windows
WillAyd Feb 7, 2018
9d7c3e6
Updated whatsnew
WillAyd Feb 7, 2018
178654d
Added GroupBy object raises tests
WillAyd Feb 7, 2018
f6ae88a
Raise ValueError in group_rank_object
WillAyd Feb 7, 2018
caacef2
Used anonymous func for rank wrapper
WillAyd Feb 7, 2018
a315a92
Removed group_rank_object
WillAyd Feb 8, 2018
a6ca485
Added comments to groupby_helper
WillAyd Feb 8, 2018
fd29d70
Added tests for rank bugs
WillAyd Feb 8, 2018
b9e4719
Fixed issue with ranks not resetting across groups
WillAyd Feb 8, 2018
613384c
Changed types; fixed tiebreaker float casting issue
WillAyd Feb 8, 2018
94a2749
Documentation cleanup
WillAyd Feb 8, 2018
3ee99c0
Removed unused import from groupby.pyx
WillAyd Feb 8, 2018
b430635
Removed npy_isnan import
WillAyd Feb 9, 2018
aa4578d
Added grp_sizes array, broke out pct calc
WillAyd Feb 9, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Removed group_rank_object
  • Loading branch information
WillAyd committed Feb 9, 2018
commit a315a92b687442c177787de361bb3b262a75f4d2
8 changes: 0 additions & 8 deletions pandas/_libs/groupby.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -130,14 +130,6 @@ def group_last_object(ndarray[object, ndim=2] out,
out[i, j] = resx[i, j]


def group_rank_object(ndarray[float64_t, ndim=2] out,
ndarray[object, ndim=2] values,
ndarray[int64_t] labels,
bint is_datetimelike, object ties_method,
bint ascending, bint pct, object na_option):
raise ValueError("rank not supported for object dtypes")


cdef inline float64_t median_linear(float64_t* a, int n) nogil:
cdef int i, j, na_count = 0
cdef float64_t result
Expand Down
3 changes: 1 addition & 2 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2038,8 +2038,7 @@ def test_rank_args_missing(self, grps, vals, ties_method, ascending,
def test_rank_object_raises(self, ties_method, ascending, na_option,
pct, vals):
df = DataFrame({'key': ['foo'] * 5, 'val': vals})
with tm.assert_raises_regex(ValueError,
"rank not supported for object dtypes"):
with tm.assert_raises_regex(TypeError, "not callable"):
df.groupby('key').rank(method=ties_method,
ascending=ascending,
na_option=na_option, pct=pct)
Expand Down