Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDEP6 implementation pt 2: eablock.setitem, eablock.putmask #53405

Merged
merged 111 commits into from
Jul 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
ccfcb8d
pt1
Dec 28, 2022
30dde08
fixup test collection
Dec 29, 2022
d2dfa3c
fixup warnings
Dec 29, 2022
e4ea519
add comments
Dec 30, 2022
e160d7f
fixup warnings
Dec 30, 2022
f7971f0
fixup test_indexing
Dec 30, 2022
373a30e
fixup test_set_value
Dec 30, 2022
26ab49d
fixup test_where
Dec 30, 2022
cc7cc5a
fixup test_asof
Dec 30, 2022
0362481
add one more explicit upcast
Dec 30, 2022
120aee7
fixup test_update
Dec 30, 2022
7972c08
fixup test_constructors
Dec 30, 2022
a16dcb8
fixup test_stack_unstack
Dec 30, 2022
25dc26f
catch warnings in test_update_dtypes
Jan 7, 2023
d45b312
fixup all test_update
Jan 7, 2023
e4ed811
start fixing up setitem
Jan 7, 2023
ace5e05
finish fixing up test_setitem
Jan 7, 2023
34a6194
more fixups
Jan 7, 2023
0cfefa4
catch numpy-dev warning
Jan 8, 2023
150fa9a
fixup some more
Jan 8, 2023
1cc3f50
fixup test_indexing
Jan 8, 2023
3f15670
fixup test_function
Jan 13, 2023
d0de3f1
fixup test_multi;
Jan 13, 2023
dc6c60b
fixup test_base
Jan 13, 2023
e33563a
fixup test_impl
Jan 13, 2023
37e0520
fixup multiindex/test_setitem
Jan 13, 2023
2dd85ab
fixup test_scalar
Jan 13, 2023
24ca7c2
fixup test_loc
Jan 13, 2023
3aed02f
fixup test_iloc
Jan 13, 2023
25f0693
fixup test_at
Jan 13, 2023
0e5fb73
fixup test_groupby
Jan 13, 2023
9f56342
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Jan 13, 2023
9b9e975
fixup some doc warnings
Jan 13, 2023
2f980c6
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Jan 26, 2023
044227f
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Mar 17, 2023
741b37d
post-merge fixup
Mar 17, 2023
c536d8a
change dtype in doctest
Mar 17, 2023
142992e
fixup doctest
Mar 17, 2023
4551855
explicit cast in test
Mar 17, 2023
f6e34ef
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Mar 20, 2023
01b0c72
fixup test for COW
Mar 20, 2023
d9c2225
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 12, 2023
6676bad
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 14, 2023
df740e6
fixup COW
Apr 14, 2023
9a54956
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 24, 2023
8386032
catch warnings in testsetitemcastingequivalents
Apr 24, 2023
39decb2
wip
Apr 24, 2023
7549888
fixup setitem test int key!
Apr 24, 2023
527fa2d
getting there!
Apr 24, 2023
caa35c3
fixup test_setitem
Apr 24, 2023
cc1542d
getting there
Apr 24, 2023
6636a37
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 25, 2023
492d443
fixup remaining warnings
Apr 25, 2023
eb8dd0f
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 25, 2023
90a64ab
fix test_update
Apr 25, 2023
272b735
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 26, 2023
b3f6b93
fixup some failing test
Apr 26, 2023
b8532cc
one more
Apr 26, 2023
81bba3c
simplify
Apr 26, 2023
87922b5
simplify and remove some false-positives
Apr 26, 2023
f314dd1
clean up
Apr 27, 2023
6b5bc73
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 27, 2023
8cad201
remove final filterwarnings
Apr 27, 2023
d3b12ab
undo unrelated change
Apr 27, 2023
adc0022
fixup raises_chained_assignment_error
Apr 27, 2023
d5bdfcf
remove another filterwarnings
Apr 27, 2023
37836e5
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 27, 2023
1a23fe7
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
Apr 27, 2023
dfa8ed2
fixup interchange test
Apr 27, 2023
3efe0a5
better parametrisation
Apr 27, 2023
cc95eec
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
May 4, 2023
4ba259d
okwarning => codeblock
May 4, 2023
c21aa4d
okwarning => codeblock in v1.3.0
May 4, 2023
05ffc27
one more codeblock
May 4, 2023
6612690
avoid upcast
May 4, 2023
d1aba37
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
May 4, 2023
adaf4e4
post-merge fixup
May 4, 2023
f194434
docs fixup;
May 4, 2023
6dc7fd1
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
May 5, 2023
82ecca2
post-merge fixup
May 5, 2023
a9d5891
remove more upcasts
May 5, 2023
0ccb541
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
MarcoGorelli May 8, 2023
aec0c87
adapt test from EA types
MarcoGorelli May 8, 2023
72e5609
move test to series/indexing
MarcoGorelli May 8, 2023
1455263
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
MarcoGorelli May 12, 2023
d16eea6
add tests about warnings
MarcoGorelli May 12, 2023
25198d4
fixup tests
MarcoGorelli May 12, 2023
ba6daa5
add dataframe tests too
MarcoGorelli May 12, 2023
9b15bc2
fixup tests
MarcoGorelli May 12, 2023
ebd1a50
simplify
MarcoGorelli May 12, 2023
a835281
try-fix docs build
MarcoGorelli May 12, 2023
908430b
Merge branch 'main' into pt1-deprecate-ban-upcasting
MarcoGorelli May 17, 2023
9802696
Merge remote-tracking branch 'upstream/main' into pt1-deprecate-ban-u…
MarcoGorelli May 22, 2023
5ee4ebf
post-merge fixup
MarcoGorelli May 22, 2023
bb37b09
raise assertionerror if self.dtype equals new_dtype
MarcoGorelli May 22, 2023
f46f7e3
Merge branch 'pt1-deprecate-ban-upcasting' of github.com:MarcoGorelli…
MarcoGorelli May 22, 2023
7df15e9
add todo for test case which should warn
MarcoGorelli May 22, 2023
f323d2a
add more todos
MarcoGorelli May 22, 2023
5f5a6a5
post-merge fixup
MarcoGorelli May 22, 2023
0fb017b
Merge branch 'main' into pt1-deprecate-ban-upcasting
MarcoGorelli May 26, 2023
6e92144
fixup setitem
MarcoGorelli May 26, 2023
2d0e953
Merge remote-tracking branch 'upstream/main' into pt2-pdep6-eablock
MarcoGorelli Jun 23, 2023
e0267b6
fixup
MarcoGorelli Jun 23, 2023
e93d940
Merge remote-tracking branch 'upstream/main' into pt2-pdep6-eablock
MarcoGorelli Jun 25, 2023
352e9a6
wip fixup
MarcoGorelli Jun 25, 2023
ff0e0dc
wip fixup
MarcoGorelli Jun 25, 2023
4f4d3f9
another fixup
MarcoGorelli Jun 25, 2023
01be57d
Merge remote-tracking branch 'upstream/main' into pt2-pdep6-eablock
MarcoGorelli Jul 24, 2023
f85d7b7
Merge remote-tracking branch 'upstream/main' into pt2-pdep6-eablock
MarcoGorelli Jul 28, 2023
56d77cd
add whatsnew
MarcoGorelli Jul 28, 2023
656950d
list examples of operations
MarcoGorelli Jul 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions doc/source/whatsnew/v0.21.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -784,10 +784,15 @@ non-datetime-like item being assigned (:issue:`14145`).

These now coerce to ``object`` dtype.

.. ipython:: python
.. code-block:: python

s[1] = 1
s
In [1]: s[1] = 1

In [2]: s
Out[2]:
0 2011-01-01 00:00:00
1 1
dtype: object

- Inconsistent behavior in ``.where()`` with datetimelikes which would raise rather than coerce to ``object`` (:issue:`16402`)
- Bug in assignment against ``int64`` data with ``np.ndarray`` with ``float64`` dtype may keep ``int64`` dtype (:issue:`14001`)
Expand Down
97 changes: 94 additions & 3 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -295,8 +295,99 @@ Other API changes
.. ---------------------------------------------------------------------------
.. _whatsnew_210.deprecations:

Deprecate parsing datetimes with mixed time zones
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Deprecations
~~~~~~~~~~~~

Deprecated silent upcasting in setitem-like Series operations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Setitem-like operations on Series (or DataFrame columns) which silently upcast the dtype are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth listing what methods are setitem-like?

deprecated and show a warning. Examples of affected operations are:

- ``ser.fillna('foo', inplace=True)``
- ``ser.where(ser.isna(), 'foo', inplace=True)``
- ``ser.iloc[indexer] = 'foo'``
- ``ser.loc[indexer] = 'foo'``
- ``df.iloc[indexer, 0] = 'foo'``
- ``df.loc[indexer, 'a'] = 'foo'``
- ``ser[indexer] = 'foo'``

where ``ser`` is a :class:`Series`, ``df`` is a :class:`DataFrame`, and ``indexer``
could be a slice, a mask, a single value, a list or array of values, or any other
allowed indexer.

In a future version, these will raise an error and you should cast to a common dtype first.

*Previous behavior*:

.. code-block:: ipython

In [1]: ser = pd.Series([1, 2, 3])

In [2]: ser
Out[2]:
0 1
1 2
2 3
dtype: int64

In [3]: ser[0] = 'not an int64'

In [4]: ser
Out[4]:
0 not an int64
1 2
2 3
dtype: object

*New behavior*:

.. code-block:: ipython

In [1]: ser = pd.Series([1, 2, 3])

In [2]: ser
Out[2]:
0 1
1 2
2 3
dtype: int64

In [3]: ser[0] = 'not an int64'
FutureWarning:
Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas.
Value 'not an int64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.

In [4]: ser
Out[4]:
0 not an int64
1 2
2 3
dtype: object

To retain the current behaviour, in the case above you could cast ``ser`` to ``object`` dtype first:

.. ipython:: python

ser = pd.Series([1, 2, 3])
ser = ser.astype('object')
ser[0] = 'not an int64'
ser

Depending on the use-case, it might be more appropriate to cast to a different dtype.
In the following, for example, we cast to ``float64``:

.. ipython:: python

ser = pd.Series([1, 2, 3])
ser = ser.astype('float64')
ser[0] = 1.1
ser

For further reading, please see https://pandas.pydata.org/pdeps/0006-ban-upcasting.html.

Deprecated parsing datetimes with mixed time zones
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Parsing datetimes with mixed time zones is deprecated and shows a warning unless user passes ``utc=True`` to :func:`to_datetime` (:issue:`50887`)

Expand Down Expand Up @@ -341,7 +432,7 @@ and ``datetime.datetime.strptime``:
pd.Series(data).apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S%z'))

Other Deprecations
~~~~~~~~~~~~~~~~~~
^^^^^^^^^^^^^^^^^^
- Deprecated 'broadcast_axis' keyword in :meth:`Series.align` and :meth:`DataFrame.align`, upcast before calling ``align`` with ``left = DataFrame({col: left for col in right.columns}, index=right.index)`` (:issue:`51856`)
- Deprecated 'downcast' keyword in :meth:`Index.fillna` (:issue:`53956`)
- Deprecated 'fill_method' and 'limit' keywords in :meth:`DataFrame.pct_change`, :meth:`Series.pct_change`, :meth:`DataFrameGroupBy.pct_change`, and :meth:`SeriesGroupBy.pct_change`, explicitly call ``ffill`` or ``bfill`` before calling ``pct_change`` instead (:issue:`53491`)
Expand Down
10 changes: 5 additions & 5 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -461,7 +461,7 @@ def coerce_to_target_dtype(self, other, warn_on_upcast: bool = False) -> Block:
FutureWarning,
stacklevel=find_stack_level(),
)
if self.dtype == new_dtype:
if self.values.dtype == new_dtype:
raise AssertionError(
f"Did not expect new dtype {new_dtype} to equal self.dtype "
f"{self.values.dtype}. Please report a bug at "
Expand Down Expand Up @@ -1723,11 +1723,11 @@ def setitem(self, indexer, value, using_cow: bool = False):

if isinstance(self.dtype, IntervalDtype):
# see TestSetitemFloatIntervalWithIntIntervalValues
nb = self.coerce_to_target_dtype(orig_value)
nb = self.coerce_to_target_dtype(orig_value, warn_on_upcast=True)
return nb.setitem(orig_indexer, orig_value)

elif isinstance(self, NDArrayBackedExtensionBlock):
nb = self.coerce_to_target_dtype(orig_value)
nb = self.coerce_to_target_dtype(orig_value, warn_on_upcast=True)
return nb.setitem(orig_indexer, orig_value)

else:
Expand Down Expand Up @@ -1841,13 +1841,13 @@ def putmask(self, mask, new, using_cow: bool = False) -> list[Block]:
if isinstance(self.dtype, IntervalDtype):
# Discussion about what we want to support in the general
# case GH#39584
blk = self.coerce_to_target_dtype(orig_new)
blk = self.coerce_to_target_dtype(orig_new, warn_on_upcast=True)
return blk.putmask(orig_mask, orig_new)

elif isinstance(self, NDArrayBackedExtensionBlock):
# NB: not (yet) the same as
# isinstance(values, NDArrayBackedExtensionArray)
blk = self.coerce_to_target_dtype(orig_new)
blk = self.coerce_to_target_dtype(orig_new, warn_on_upcast=True)
return blk.putmask(orig_mask, orig_new)

else:
Expand Down
10 changes: 8 additions & 2 deletions pandas/tests/frame/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -831,7 +831,10 @@ def test_setitem_single_column_mixed_datetime(self):
tm.assert_series_equal(result, expected)

# GH#16674 iNaT is treated as an integer when given by the user
df.loc["b", "timestamp"] = iNaT
with tm.assert_produces_warning(
FutureWarning, match="Setting an item of incompatible dtype"
):
df.loc["b", "timestamp"] = iNaT
assert not isna(df.loc["b", "timestamp"])
assert df["timestamp"].dtype == np.object_
assert df.loc["b", "timestamp"] == iNaT
Expand Down Expand Up @@ -862,7 +865,10 @@ def test_setitem_mixed_datetime(self):
df = DataFrame(0, columns=list("ab"), index=range(6))
df["b"] = pd.NaT
df.loc[0, "b"] = datetime(2012, 1, 1)
df.loc[1, "b"] = 1
with tm.assert_produces_warning(
FutureWarning, match="Setting an item of incompatible dtype"
):
df.loc[1, "b"] = 1
df.loc[[2, 3], "b"] = "x", "y"
A = np.array(
[
Expand Down
25 changes: 20 additions & 5 deletions pandas/tests/frame/indexing/test_where.py
Original file line number Diff line number Diff line change
Expand Up @@ -735,7 +735,10 @@ def test_where_interval_fullop_downcast(self, frame_or_series):
tm.assert_equal(res, other.astype(np.int64))

# unlike where, Block.putmask does not downcast
obj.mask(obj.notna(), other, inplace=True)
with tm.assert_produces_warning(
FutureWarning, match="Setting an item of incompatible dtype"
):
obj.mask(obj.notna(), other, inplace=True)
tm.assert_equal(obj, other.astype(object))

@pytest.mark.parametrize(
Expand Down Expand Up @@ -775,7 +778,10 @@ def test_where_datetimelike_noop(self, dtype):
tm.assert_frame_equal(res5, expected)

# unlike where, Block.putmask does not downcast
df.mask(~mask2, 4, inplace=True)
with tm.assert_produces_warning(
FutureWarning, match="Setting an item of incompatible dtype"
):
df.mask(~mask2, 4, inplace=True)
tm.assert_frame_equal(df, expected.astype(object))


Expand Down Expand Up @@ -930,7 +936,10 @@ def test_where_period_invalid_na(frame_or_series, as_cat, request):
result = obj.mask(mask, tdnat)
tm.assert_equal(result, expected)

obj.mask(mask, tdnat, inplace=True)
with tm.assert_produces_warning(
FutureWarning, match="Setting an item of incompatible dtype"
):
obj.mask(mask, tdnat, inplace=True)
tm.assert_equal(obj, expected)


Expand Down Expand Up @@ -1006,7 +1015,10 @@ def test_where_dt64_2d():

# setting all of one column, none of the other
expected = DataFrame({"A": other[:, 0], "B": dta[:, 1]})
_check_where_equivalences(df, mask, other, expected)
with tm.assert_produces_warning(
FutureWarning, match="Setting an item of incompatible dtype"
):
_check_where_equivalences(df, mask, other, expected)

# setting part of one column, none of the other
mask[1, 0] = True
Expand All @@ -1016,7 +1028,10 @@ def test_where_dt64_2d():
"B": dta[:, 1],
}
)
_check_where_equivalences(df, mask, other, expected)
with tm.assert_produces_warning(
FutureWarning, match="Setting an item of incompatible dtype"
):
_check_where_equivalences(df, mask, other, expected)

# setting nothing in either column
mask[:] = True
Expand Down
3 changes: 1 addition & 2 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -2549,8 +2549,7 @@ def check_views(c_only: bool = False):
check_views()

# TODO: most of the rest of this test belongs in indexing tests
# TODO: 'm' and 'M' should warn
if lib.is_np_dtype(df.dtypes.iloc[0], "fciuOmM"):
if lib.is_np_dtype(df.dtypes.iloc[0], "fciuO"):
warn = None
else:
warn = FutureWarning
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/indexing/test_at.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@
def test_at_timezone():
# https://github.com/pandas-dev/pandas/issues/33544
result = DataFrame({"foo": [datetime(2000, 1, 1)]})
result.at[0, "foo"] = datetime(2000, 1, 2, tzinfo=timezone.utc)
with tm.assert_produces_warning(FutureWarning, match="incompatible dtype"):
result.at[0, "foo"] = datetime(2000, 1, 2, tzinfo=timezone.utc)
expected = DataFrame(
{"foo": [datetime(2000, 1, 2, tzinfo=timezone.utc)]}, dtype=object
)
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/indexing/test_loc.py
Original file line number Diff line number Diff line change
Expand Up @@ -1449,7 +1449,8 @@ def test_loc_setitem_datetime_coercion(self):
df.loc[0:1, "c"] = np.datetime64("2008-08-08")
assert Timestamp("2008-08-08") == df.loc[0, "c"]
assert Timestamp("2008-08-08") == df.loc[1, "c"]
df.loc[2, "c"] = date(2005, 5, 5)
with tm.assert_produces_warning(FutureWarning, match="incompatible dtype"):
df.loc[2, "c"] = date(2005, 5, 5)
assert Timestamp("2005-05-05").date() == df.loc[2, "c"]

@pytest.mark.parametrize("idxer", ["var", ["var"]])
Expand Down
15 changes: 10 additions & 5 deletions pandas/tests/internals/test_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -1312,17 +1312,20 @@ def test_interval_can_hold_element(self, dtype, element):
# `elem` to not have the same length as `arr`
ii2 = IntervalIndex.from_breaks(arr[:-1], closed="neither")
elem = element(ii2)
self.check_series_setitem(elem, ii, False)
with tm.assert_produces_warning(FutureWarning):
self.check_series_setitem(elem, ii, False)
assert not blk._can_hold_element(elem)

ii3 = IntervalIndex.from_breaks([Timestamp(1), Timestamp(3), Timestamp(4)])
elem = element(ii3)
self.check_series_setitem(elem, ii, False)
with tm.assert_produces_warning(FutureWarning):
self.check_series_setitem(elem, ii, False)
assert not blk._can_hold_element(elem)

ii4 = IntervalIndex.from_breaks([Timedelta(1), Timedelta(3), Timedelta(4)])
elem = element(ii4)
self.check_series_setitem(elem, ii, False)
with tm.assert_produces_warning(FutureWarning):
self.check_series_setitem(elem, ii, False)
assert not blk._can_hold_element(elem)

def test_period_can_hold_element_emptylist(self):
Expand All @@ -1341,11 +1344,13 @@ def test_period_can_hold_element(self, element):
# `elem` to not have the same length as `arr`
pi2 = pi.asfreq("D")[:-1]
elem = element(pi2)
self.check_series_setitem(elem, pi, False)
with tm.assert_produces_warning(FutureWarning):
self.check_series_setitem(elem, pi, False)

dti = pi.to_timestamp("S")[:-1]
elem = element(dti)
self.check_series_setitem(elem, pi, False)
with tm.assert_produces_warning(FutureWarning):
self.check_series_setitem(elem, pi, False)

def check_can_hold_element(self, obj, elem, inplace: bool):
blk = obj._mgr.blocks[0]
Expand Down
Loading
Loading