ENH: Support kurtosis (kurt) in DataFrameGroupBy and SeriesGroupBy #60433

snitish · 2024-11-27T17:20:43Z

closes ENH:AttributeError: 'SeriesGroupBy' object has no attribute 'kurtosis' #40139
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

DataFrameGroupBy and SeriesGroupBy currently support mean, std and skew (the first 3 moments) but not kurtosis (the 4th moment). This change addresses that. I implemented kurtosis in cython in similar fashion to skewness. I've verified that the output of this function matches that of DataFrame.kurt().

rhshadrach

Very nice! A few test requests (I think these are not covered yet):

skipna with NA values in the data
Float64 and float64[pyarrow] dtypes
Constant data (e.g. [1, 1, 1, 1])

pandas/_libs/groupby.pyx

pandas/core/groupby/generic.py

pandas/tests/groupby/methods/test_kurt.py

snitish · 2024-12-04T00:12:51Z

Thanks for the review @rhshadrach.

Addressed your comments
Added test case for skipna=False (by default it's true)
Added test case for float64[pyarrow] (we already have one for float64)
Added test case for constant data. Note that the result here is 0.0, consistent with DataFrame.kurt() and Series.kurt()

rhshadrach

Added test case for float64[pyarrow] (we already have one for float64)

The request was for Float64, the NumPy-nullable array. Can just parameterize your arrow test I think:

@pytest.mark.parametrize("dtype", [pytest.param("float64[pyarrow]", marks=td.skip_if_no("pyarrow")), "Float64")

doc/source/whatsnew/v3.0.0.rst

rhshadrach · 2024-12-15T20:56:03Z

pandas/tests/groupby/methods/test_kurt.py

+    # GH#40139
+    # Test that that groupby kurt method (which uses libgroupby.group_kurt)
+    #  matches the results of operating group-by-group (which uses nanops.nankurt)
+    nrows = 1000


Was first concerned about runtime, but 10, 100, and 1000 all run in about the same time on my machine, the bottleneck appears to be O(1) overhead. I don't see O(n) behavior until 100_000.

rhshadrach · 2024-12-15T20:57:41Z

pandas/tests/groupby/methods/test_kurt.py

+    arr = np.random.default_rng(2).standard_normal((nrows, ncols))
+    arr[np.random.default_rng(2).random(nrows) < nan_frac] = np.nan


@mroeschke - I think you reworked the random data generation a while back, want to make sure this agrees with those patterns.

rhshadrach · 2025-01-04T21:16:13Z

Apologies @snitish - this fell off my radar. Should be able to get to it today or tomorrow.

rhshadrach

lgtm - will merge in two days unless there are further reviews.

snitish · 2025-01-08T22:33:34Z

Thank you @rhshadrach!

mroeschke · 2025-01-10T17:42:46Z

Thanks @snitish

ENH: Support kurtosis (kurt) in DataFrameGroupBy and SeriesGroupBy

7712840

snitish requested review from rhshadrach and WillAyd as code owners November 27, 2024 17:20

mroeschke added Groupby Reduction Operations sum, mean, min, max, etc. labels Nov 27, 2024

snitish mentioned this pull request Dec 1, 2024

ENH:AttributeError: 'SeriesGroupBy' object has no attribute 'kurtosis' #40139

Closed

rhshadrach requested changes Dec 3, 2024

View reviewed changes

ENH: Address review comments

290378f

snitish added 4 commits December 3, 2024 16:16

ENH: Fix comments in new test cases

1adbb0c

ENH: Skip pyarrow test case if no pyarrow available

c5df6ec

ENH: Update to intp instead of np.intp

aaacc27

ENH: Change intp to int64

4fc5ca2

snitish requested a review from rhshadrach December 4, 2024 03:05

Merge branch 'main' into kurtosis

87c803b

rhshadrach requested changes Dec 15, 2024

View reviewed changes

Address review comments

e42a060

rhshadrach approved these changes Jan 8, 2025

View reviewed changes

rhshadrach added this to the 3.0 milestone Jan 8, 2025

mroeschke approved these changes Jan 10, 2025

View reviewed changes

mroeschke merged commit a81d52f into pandas-dev:main Jan 10, 2025
50 of 51 checks passed

snitish mentioned this pull request Jan 17, 2025

DOC: Update doc for newly added groupby method kurt #60725

Merged

snitish deleted the kurtosis branch February 6, 2025 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Support kurtosis (kurt) in DataFrameGroupBy and SeriesGroupBy #60433

ENH: Support kurtosis (kurt) in DataFrameGroupBy and SeriesGroupBy #60433

Uh oh!

snitish commented Nov 27, 2024 •

edited

Loading

Uh oh!

rhshadrach left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

snitish commented Dec 4, 2024

Uh oh!

rhshadrach left a comment

Uh oh!

Uh oh!

rhshadrach Dec 15, 2024

Uh oh!

rhshadrach Dec 15, 2024

Uh oh!

rhshadrach commented Jan 4, 2025

Uh oh!

rhshadrach left a comment •

edited

Loading

Uh oh!

snitish commented Jan 8, 2025

Uh oh!

Uh oh!

mroeschke commented Jan 10, 2025

Uh oh!

Uh oh!

		arr = np.random.default_rng(2).standard_normal((nrows, ncols))
		arr[np.random.default_rng(2).random(nrows) < nan_frac] = np.nan

Uh oh!

ENH: Support kurtosis (kurt) in DataFrameGroupBy and SeriesGroupBy #60433

ENH: Support kurtosis (kurt) in DataFrameGroupBy and SeriesGroupBy #60433

Uh oh!

Conversation

snitish commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

snitish commented Dec 4, 2024

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rhshadrach Dec 15, 2024

Choose a reason for hiding this comment

Uh oh!

rhshadrach Dec 15, 2024

Choose a reason for hiding this comment

Uh oh!

rhshadrach commented Jan 4, 2025

Uh oh!

rhshadrach left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

snitish commented Jan 8, 2025

Uh oh!

Uh oh!

mroeschke commented Jan 10, 2025

Uh oh!

Uh oh!

snitish commented Nov 27, 2024 •

edited

Loading

rhshadrach left a comment •

edited

Loading