Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add Float128 support for groupby. #59483

Open
1 of 3 tasks
cemde opened this issue Aug 11, 2024 · 8 comments
Open
1 of 3 tasks

ENH: Add Float128 support for groupby. #59483

cemde opened this issue Aug 11, 2024 · 8 comments

Comments

@cemde
Copy link

cemde commented Aug 11, 2024

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish to use groupby()[].mean() on a column with np.float128. However, that is not supported.

Feature Description

import numpy as np
import pandas as pd

# Set the random seed for reproducibility
np.random.seed(42)

# Generate binary data for columns A and B
data_A = np.random.randint(0, 2, size=1000)
data_B = np.random.randint(0, 2, size=1000)

# Generate lognormal random values for column C with np.float128 precision
data_C = np.random.lognormal(mean=0, sigma=1, size=1000).astype(np.float128)

# Create the DataFrame
df = pd.DataFrame({
    "A": data_A,
    "B": data_B,
    "C": data_C
})

# Group by columns A and B and get the mean of column C
grouped_means = df.groupby(["A", "B"])["C"].mean()

grouped_means

results in this error message:

  File "/homes/cornelius/numpy128.py", line 22, in <module>
    grouped_means = df.groupby(["A", "B"])["C"].mean()
  File "/homes/cornelius/anaconda3/envs/general/lib/python3.10/site-packages/pandas/core/groupby/groupby.py", line 2375, in mean
    result = self._cython_agg_general(
  File "/homes/cornelius/anaconda3/envs/general/lib/python3.10/site-packages/pandas/core/groupby/groupby.py", line 1926, in _cython_agg_general
    new_mgr = data.grouped_reduce(array_func)
  File "/homes/cornelius/anaconda3/envs/general/lib/python3.10/site-packages/pandas/core/internals/base.py", line 336, in grouped_reduce
    res = func(arr)
  File "/homes/cornelius/anaconda3/envs/general/lib/python3.10/site-packages/pandas/core/groupby/groupby.py", line 1902, in array_func
    result = self.grouper._cython_operation(
  File "/homes/cornelius/anaconda3/envs/general/lib/python3.10/site-packages/pandas/core/groupby/ops.py", line 815, in _cython_operation
    return cy_op.cython_operation(
  File "/homes/cornelius/anaconda3/envs/general/lib/python3.10/site-packages/pandas/core/groupby/ops.py", line 534, in cython_operation
    return self._cython_op_ndim_compat(
  File "/homes/cornelius/anaconda3/envs/general/lib/python3.10/site-packages/pandas/core/groupby/ops.py", line 323, in _cython_op_ndim_compat
    res = self._call_cython_op(
  File "/homes/cornelius/anaconda3/envs/general/lib/python3.10/site-packages/pandas/core/groupby/ops.py", line 403, in _call_cython_op
    func(
  File "groupby.pyx", line 989, in pandas._libs.groupby.__pyx_fused_cpdef
TypeError: No matching signature found

Alternative Solutions

Converting to numpy and back to pandas can work. But this is not as pleasant as pandas.

Additional Context

Environment:
OS: CentOS Linux 8

python 3.10.13
pandas 2.1.1
numpy 1.24.1

@cemde cemde added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 11, 2024
@rhshadrach
Copy link
Member

I'm positive here, assuming the increase in wheel size is negligible. From a code standpoint, I do not think it would be much effort to expand to this case.

@rhshadrach rhshadrach added Groupby and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 12, 2024
@RafaelGreg18
Copy link

Hello,

Can this issue be assigned to me? Me and my college group are going to solve it for a college project.

@rhshadrach
Copy link
Member

@RafaelGreg18 - sounds good, please see our contributing docs! In particular:

https://pandas.pydata.org/docs/dev/development/contributing.html#finding-an-issue-to-contribute-to

@RafaelGreg18
Copy link

take

@RafaelGreg18
Copy link

Using Pandas 2.2.3, the snippet provided works just fine. This issue seems to be fixed already.

@rhshadrach
Copy link
Member

@RafaelGreg18 - I'm still seeing this error on main. Can you post the output of

print(grouped_means.dtypes)

after running the example in the OP.

@RafaelGreg18
Copy link

It outputs float64 (should be float128, right?). Just a little note, I couldn't get np.float128 to work, only np.longdouble, but I think it's the same thing.

@RafaelGreg18
Copy link

Just noticed that doing:

data_C = np.random.lognormal(mean=0, sigma=1, size=1000).astype(np.longdouble)

Actually creates a float64 array, not a float128 one. Also, according to numpy's documentation:

"np.float96 and np.float128 are provided for users who want specific padding. In spite of the names, np.float96 and np.float128 provide only as much precision as np.longdouble, that is, 80 bits on most x86 machines and 64 bits in standard Windows builds."

Any tips on how to recreate the error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants