Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Subtraction fails with matching index of different name and type #57524

Open
2 of 3 tasks
Bjoern-Rapp opened this issue Feb 20, 2024 · 6 comments
Open
2 of 3 tasks
Assignees
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@Bjoern-Rapp
Copy link

Bjoern-Rapp commented Feb 20, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

s1 = pd.Series(
    [23, 22, 21], index=pd.Index(["a", "b", "c"], name="index a"), dtype=pd.Int64Dtype()
)
s2 = pd.Series(
    [21, 22, 23],
    index=pd.Index(["a", "b", "c"], name="index b", dtype=pd.StringDtype()),
    dtype=pd.Int64Dtype(),
)

s1 - s2

Issue Description

This example with two series with a nullable integer data type and a index with different names and of differing data types, on with a nullable string datatype and object datatype fails with and error from pandas._libs.algos.ensure_platform_int()


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[170], line 10
      1 s1 = pd.Series(
      2     [23, 22, 21], index=pd.Index(["a", "b", "c"], name="index a"), dtype=pd.Int64Dtype()
      3 )
      4 s2 = pd.Series(
      5     [21, 22, 23],
      6     index=pd.Index(["a", "b", "c"], name="index b", dtype=pd.StringDtype()),
      7     dtype=pd.Int64Dtype(),
      8 )
---> 10 s1 - s2

File ~/beftett/.venv/lib/python3.10/site-packages/pandas/core/ops/common.py:76, in _unpack_zerodim_and_defer.<locals>.new_method(self, other)
     72             return NotImplemented
     74 other = item_from_zerodim(other)
---> 76 return method(self, other)

File ~/beftett/.venv/lib/python3.10/site-packages/pandas/core/arraylike.py:194, in OpsMixin.__sub__(self, other)
    192 @unpack_zerodim_and_defer("__sub__")
    193 def __sub__(self, other):
--> 194     return self._arith_method(other, operator.sub)

File ~/beftett/.venv/lib/python3.10/site-packages/pandas/core/series.py:5814, in Series._arith_method(self, other, op)
   5813 def _arith_method(self, other, op):
-> 5814     self, other = self._align_for_op(other)
   5815     return base.IndexOpsMixin._arith_method(self, other, op)

File ~/beftett/.venv/lib/python3.10/site-packages/pandas/core/series.py:5844, in Series._align_for_op(self, right, align_asobject)
   5841             left = left.astype(object)
   5842             right = right.astype(object)
-> 5844         left, right = left.align(right, copy=False)
   5846 return left, right

File ~/beftett/.venv/lib/python3.10/site-packages/pandas/core/generic.py:10095, in NDFrame.align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis)
  10082     left, _right, join_index = self._align_frame(
  10083         other,
  10084         join=join,
   (...)
  10091         fill_axis=fill_axis,
  10092     )
  10094 elif isinstance(other, ABCSeries):
> 10095     left, _right, join_index = self._align_series(
  10096         other,
  10097         join=join,
  10098         axis=axis,
  10099         level=level,
  10100         copy=copy,
  10101         fill_value=fill_value,
  10102         method=method,
  10103         limit=limit,
  10104         fill_axis=fill_axis,
  10105     )
  10106 else:  # pragma: no cover
  10107     raise TypeError(f"unsupported type: {type(other)}")

File ~/beftett/.venv/lib/python3.10/site-packages/pandas/core/generic.py:10224, in NDFrame._align_series(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis)
  10221         new_mgr = self._mgr.reindex_indexer(join_index, lidx, axis=1, copy=copy)
  10222         left = self._constructor_from_mgr(new_mgr, axes=new_mgr.axes)
> 10224     right = other._reindex_indexer(join_index, ridx, copy)
  10226 else:
  10227     # one has > 1 ndim
  10228     fdata = self._mgr

File ~/beftett/.venv/lib/python3.10/site-packages/pandas/core/series.py:4779, in Series._reindex_indexer(self, new_index, indexer, copy)
   4776         return self.copy(deep=copy)
   4777     return self
-> 4779 new_values = algorithms.take_nd(
   4780     self._values, indexer, allow_fill=True, fill_value=None
   4781 )
   4782 return self._constructor(new_values, index=new_index, copy=False)

File ~/beftett/.venv/lib/python3.10/site-packages/pandas/core/array_algos/take.py:115, in take_nd(arr, indexer, axis, fill_value, allow_fill)
    110         arr = cast("NDArrayBackedExtensionArray", arr)
    111         return arr.take(
    112             indexer, fill_value=fill_value, allow_fill=allow_fill, axis=axis
    113         )
--> 115     return arr.take(indexer, fill_value=fill_value, allow_fill=allow_fill)
    117 arr = np.asarray(arr)
    118 return _take_nd_ndarray(arr, indexer, axis, fill_value, allow_fill)

File ~/beftett/.venv/lib/python3.10/site-packages/pandas/core/arrays/masked.py:905, in BaseMaskedArray.take(self, indexer, allow_fill, fill_value, axis)
    894 def take(
    895     self,
    896     indexer,
   (...)
    902     # we always fill with 1 internally
    903     # to avoid upcasting
    904     data_fill_value = self._internal_fill_value if isna(fill_value) else fill_value
--> 905     result = take(
    906         self._data,
    907         indexer,
    908         fill_value=data_fill_value,
    909         allow_fill=allow_fill,
    910         axis=axis,
    911     )
    913     mask = take(
    914         self._mask, indexer, fill_value=True, allow_fill=allow_fill, axis=axis
    915     )
    917     # if we are filling
    918     # we only fill where the indexer is null
    919     # not existing missing values
    920     # TODO(jreback) what if we have a non-na float as a fill value?

File ~/beftett/.venv/lib/python3.10/site-packages/pandas/core/algorithms.py:1309, in take(arr, indices, axis, allow_fill, fill_value)
   1306 if not is_array_like(arr):
   1307     arr = np.asarray(arr)
-> 1309 indices = ensure_platform_int(indices)
   1311 if allow_fill:
   1312     # Pandas style, -1 means NA
   1313     validate_indices(indices, arr.shape[axis])

File pandas/_libs/algos_common_helper.pxi:22, in pandas._libs.algos.ensure_platform_int()

TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

If any of these three conditions is removed, the subtraction succeeds.

Expected Behavior

The subtraction should succeed.

Installed Versions

commit: 2a953cf
python: 3.10.9.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.133+
Version : #1 SMP Sat Dec 30 11:18:04 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.1.3
numpy : 1.24.4
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 69.0.3
pip : 23.2.1
Cython : None
pytest : 7.4.4
hypothesis : 6.91.0
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.0.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.19.0
pandas_datareader : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : 2023.12.2
gcsfs : 2023.12.2post1
matplotlib : 3.8.2
numba : 0.57.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 13.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : 2.0.24
tables : None
tabulate : None
xarray : 2023.8.0
xlrd : None
zstandard : None
tzdata : 2023.4
qtpy : None
pyqt5 : None

@Bjoern-Rapp Bjoern-Rapp added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 20, 2024
@Bjoern-Rapp Bjoern-Rapp changed the title BUG: BUG: Subtraction fails with matching index of different name and type Feb 20, 2024
@rhshadrach
Copy link
Member

Thanks for the report. Might be related to #45564 - cc @phofl.

pandas/pandas/core/series.py

Lines 4637 to 4646 in 9530851

# Note: new_index is None iff indexer is None
# if not None, indexer is np.intp
if indexer is None and (
new_index is None or new_index.names == self.index.names
):
return self.copy(deep=False)
new_values = algorithms.take_nd(
self._values, indexer, allow_fill=True, fill_value=None
)

Prior to this PR, we would not call take_nd when indexer is None. Now we might be passing indexer=None to take_nd. For NumPy arrays, we end up in _take_nd_array which has the lines:

if indexer is None:
indexer = np.arange(arr.shape[axis], dtype=np.intp)
dtype, fill_value = arr.dtype, arr.dtype.type()

but no such logic exists for EAs. Perhaps it should?

@rhshadrach rhshadrach added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff ExtensionArray Extending pandas with custom dtypes or arrays. Numeric Operations Arithmetic, Comparison, and Logical operations and removed Needs Triage Issue that has not been reviewed by a pandas team member Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Feb 25, 2024
@Adasjo
Copy link

Adasjo commented Feb 26, 2024

take

1 similar comment
@CasperKristiansson
Copy link

take

@VictorOsterman
Copy link

take

2 similar comments
@Tobiky
Copy link

Tobiky commented Feb 26, 2024

take

@Lindefor
Copy link

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
7 participants