Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Assignment of pyarrow arrays yield unexpected dtypes #58601

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
b29fb0f
ArrowDtype type are taken into account in a column assignment
droussea2001 May 1, 2024
bfbd7dc
Add test for pyarrow assignment in column test_assign_pyarrow_columns
droussea2001 May 1, 2024
04b1edc
force dtype cast in _sanitize_column only for pa.lib.Array
droussea2001 May 1, 2024
5d2016d
move test_assign_column_in_dataframe from test_alter_axes.py to test_…
droussea2001 May 2, 2024
61c0697
Merge remote-tracking branch 'upstream/main' into BUG-56994/pyarrow-a…
droussea2001 May 2, 2024
4ed7820
Merge remote-tracking branch 'upstream/main' into BUG-56994/pyarrow-a…
droussea2001 May 6, 2024
86efdfd
test_assign_column_in_dataframe is configurable by the data fixture
droussea2001 May 6, 2024
e7e3a8b
manage optional pyarrow import
droussea2001 May 6, 2024
da3f135
Integrate docstring correction
droussea2001 May 7, 2024
9c53b89
add an xfail to test_assign_column_in_dataframe to manage version wit…
droussea2001 May 7, 2024
f28b026
correct pyarrow version check
droussea2001 May 7, 2024
703a0c6
Correct wrong pyarrow check
droussea2001 May 17, 2024
0f924e1
Remove unnecessary xfail
droussea2001 May 17, 2024
def88e2
Merge branch 'main' into BUG-56994/pyarrow-assignment-unexpected-dtypes
droussea2001 May 17, 2024
0a914da
Merge branch 'main' into BUG-56994/pyarrow-assignment-unexpected-dtypes
droussea2001 Jun 20, 2024
5aeac95
Move pyarrow check in sanitize_array
droussea2001 Jun 20, 2024
2f07cc0
Move pyarrow check in sanitize_array
droussea2001 Jun 20, 2024
a94023f
Code clean up
droussea2001 Jun 20, 2024
e69265b
Check if dtype has been initialized before
droussea2001 Jun 21, 2024
cc092e3
Merge remote-tracking branch 'upstream/main' into BUG-56994/pyarrow-a…
droussea2001 Jun 21, 2024
12727f0
Try to process arrow dtype before any dtype modification
droussea2001 Jun 21, 2024
2c7912b
Merge branch 'main' into BUG-56994/pyarrow-assignment-unexpected-dtypes
droussea2001 Jun 24, 2024
7c15d07
Merge branch 'main' into BUG-56994/pyarrow-assignment-unexpected-dtypes
droussea2001 Jun 25, 2024
0f1c524
Merge branch 'main' into BUG-56994/pyarrow-assignment-unexpected-dtypes
droussea2001 Jun 26, 2024
e983264
Merge branch 'main' into BUG-56994/pyarrow-assignment-unexpected-dtypes
droussea2001 Jun 26, 2024
bd38f15
Add entry in last new in section Conversion
droussea2001 Jun 27, 2024
96871c1
Replace pyarrow type check by existing lib function
droussea2001 Jun 27, 2024
fadf487
Merge branch 'BUG-56994/pyarrow-assignment-unexpected-dtypes' of http…
droussea2001 Jun 27, 2024
5dadf0e
Merge remote-tracking branch 'upstream/main' into BUG-56994/pyarrow-a…
droussea2001 Jun 27, 2024
747756f
Merge branch 'main' into BUG-56994/pyarrow-assignment-unexpected-dtypes
droussea2001 Jul 28, 2024
fca36ea
Merge branch 'main' into BUG-56994/pyarrow-assignment-unexpected-dtypes
droussea2001 Jul 30, 2024
ad3d736
Merge branch 'main' into BUG-56994/pyarrow-assignment-unexpected-dtypes
droussea2001 Aug 1, 2024
4773051
Merge branch 'main' into BUG-56994/pyarrow-assignment-unexpected-dtypes
droussea2001 Aug 9, 2024
a4dc614
Merge branch 'main' into BUG-56994/pyarrow-assignment-unexpected-dtypes
droussea2001 Aug 24, 2024
4e03be0
Merge branch 'main' into BUG-56994/pyarrow-assignment-unexpected-dtypes
droussea2001 Sep 9, 2024
97c8e56
Merge branch 'main' into BUG-56994/pyarrow-assignment-unexpected-dtypes
droussea2001 Oct 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -646,6 +646,7 @@ Conversion
- Bug in :meth:`DataFrame.update` bool dtype being converted to object (:issue:`55509`)
- Bug in :meth:`Series.astype` might modify read-only array inplace when casting to a string dtype (:issue:`57212`)
- Bug in :meth:`Series.reindex` not maintaining ``float32`` type when a ``reindex`` introduces a missing value (:issue:`45857`)
- Bug in :meth:`sanitize_array` was not taking into account pyarrow arrays. (:issue:`56994`)

Strings
^^^^^^^
Expand Down
12 changes: 11 additions & 1 deletion pandas/core/construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
get_supported_dtype,
is_supported_dtype,
)
from pandas.compat import pa_version_under10p1

from pandas.core.dtypes.base import ExtensionDtype
from pandas.core.dtypes.cast import (
Expand All @@ -40,7 +41,10 @@
is_object_dtype,
pandas_dtype,
)
from pandas.core.dtypes.dtypes import NumpyEADtype
from pandas.core.dtypes.dtypes import (
ArrowDtype,
NumpyEADtype,
)
from pandas.core.dtypes.generic import (
ABCDataFrame,
ABCExtensionArray,
Expand All @@ -51,6 +55,9 @@

import pandas.core.common as com

if not pa_version_under10p1:
pass

if TYPE_CHECKING:
from collections.abc import Sequence

Expand Down Expand Up @@ -549,6 +556,9 @@ def sanitize_array(
np.ndarray or ExtensionArray
"""
original_dtype = dtype
if not pa_version_under10p1 and lib.is_pyarrow_array(data) and dtype is None:
dtype = ArrowDtype(data.type)

if isinstance(data, ma.MaskedArray):
data = sanitize_masked_array(data)

Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/extension/test_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -1085,6 +1085,14 @@ def test_comp_masked_numpy(self, masked_dtype, comparison_op):
expected = pd.Series(exp, dtype=ArrowDtype(pa.bool_()))
tm.assert_series_equal(result, expected)

def test_assign_column_in_dataframe(self, data):
df = pd.DataFrame(data=data, columns=["A"], dtype=data.dtype)
df["B"] = pa.array(data, type=data.dtype.pyarrow_dtype)
result = df.dtypes
expected = pd.Series({"A": data.dtype, "B": data.dtype})

tm.assert_series_equal(result, expected)


class TestLogicalOps:
"""Various Series and DataFrame logical ops methods."""
Expand Down