Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: _validate_setitem_value fails to raise for PandasArray #51044

Open
jbrockmendel opened this issue Jan 28, 2023 · 22 comments
Open

BUG: _validate_setitem_value fails to raise for PandasArray #51044

jbrockmendel opened this issue Jan 28, 2023 · 22 comments
Assignees
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Jan 28, 2023

arr = pd.Series(range(5)).array

arr._validate_setitem_value("foo")  # <- fails to raise
arr._validate_setitem_value(1.5)  # <- fails to raise

AFAICT this doesn't affect anything bc we always extract PandasArray before doing anything.

Note that if this did raise on 1.5, it might cause a problem with arr.searchsorted(floats) raising when we wouldn't want it to.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 28, 2023
@topper-123
Copy link
Contributor

Just delete the Pandasarray._validate_setitem_value method, so if something calls this, then an AttributeError is raised?

@topper-123 topper-123 added Indexing Related to indexing on series/frames, not to indexes themselves ExtensionArray Extending pandas with custom dtypes or arrays. and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 18, 2023
@jbrockmendel
Copy link
Member Author

jbrockmendel commented May 19, 2023

it should raise ValueError/TypeError (not sure which off the top of my head), should match DTA/TDA etc behavior

@topper-123
Copy link
Contributor

topper-123 commented May 19, 2023

ah yes, it's set on a parent class as-is, so will have to raise here, of course.

@Jacsep
Copy link

Jacsep commented May 21, 2023

take

@happyvignesh
Copy link

Hi @topper-123, I am a beginner contributor.
May I know what change I need to add for this issue.

@topper-123
Copy link
Contributor

Hey @happyvignesh. Sounds great you want to contribute.

What is needed here is adding a method called _validate_setitem_value to PandasArray. The method should have the same signature as BaseMaskedArray._validate_setitem_value, but should raise a TypeError if the validated value cannot be inserted without changing the dtype.

@happyvignesh
Copy link

Hey @topper-123,
Thanks for your response.
if I understand correctly, I need to add the below snippet in class PandasArray.

`def _validate_setitem_value(self, value):
"""
Check if we have a scalar that we can cast losslessly.

    Raises
    ------
    TypeError
    """
    kind = self.dtype.kind
    # TODO: get this all from np_can_hold_element?
    if kind == "b":
        if lib.is_bool(value):
            return value

    elif kind == "f":
        if lib.is_integer(value) or lib.is_float(value):
            return value

    else:
        if lib.is_integer(value) or (lib.is_float(value) and value.is_integer()):
            return value
        # TODO: unsigned checks

    # Note: without the "str" here, the f-string rendering raises in
    #  py38 builds.
    raise TypeError(f"value '{str(value)}' cannot be inserted without changing the dtype to {self.dtype}")`

Could you please let me know if its fine.
Thanks.

@topper-123
Copy link
Contributor

topper-123 commented Jun 2, 2023

PandasArray can take many numpy dtypes, e.g. pd.array(["a"], dtype=object) can work, also pd.array(["a", 1]) will have dtype str672.

If you can find a way to ensure the new scalar is compatible with whatever numpy dtype is used, I think that is required also. I'm not sure ATM how'd you do that, but maybe you can find a way.

@theyashl
Copy link

Hi @topper-123 , instead of checking kind of data type, can we simply compare the data type itself?

for example:

PandasDtype(type(value)) == self.dtype

As this will compare data type itself of current class and provide value's class.

@dbalagula
Copy link

take

@hvsesha
Copy link

hvsesha commented Oct 30, 2023

if isinstance(value, str):
raise ValueError("String values are not allowed in an integer array.")

This also works .Kindly let me know is this looks good .I made change in num_.py

@hvsesha
Copy link

hvsesha commented Oct 30, 2023

Suppose you are setting integer array and if you are calling below statement
import pandas as pd
arr = pd.Series(range(5)).array

arr.setitem(arr,"foo") # <- fails to raise
print(arr)
arr.setitem(arr,"foo") This is raise exception

I tried and it worked

@kvnwng11
Copy link

kvnwng11 commented Apr 2, 2024

take

@flaviaouyang
Copy link
Contributor

@kvnwng11 are you still working on this issue?

@kvnwng11
Copy link

kvnwng11 commented Apr 16, 2024

@flaviaouyang Thanks for the ping! I am not still working, so I will unassign myself

@kvnwng11 kvnwng11 removed their assignment Apr 16, 2024
@flaviaouyang
Copy link
Contributor

take

@yiorgosynkl
Copy link

is this actively being worked on? @flaviaouyang

@flaviaouyang flaviaouyang removed their assignment Jun 18, 2024
@flaviaouyang
Copy link
Contributor

@yiorgosynkl no, go ahead

@yiorgosynkl
Copy link

take

@jahn96
Copy link
Contributor

jahn96 commented Jul 18, 2024

@yiorgosynkl are you currently working on this?

@maushumee
Copy link
Contributor

@yiorgosynkl is this being worked on?

@maushumee
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment