Skip to content

REGR: AssertionError when subtracting Timestamp-valued DataFrames with non-indentical column index #31623

Closed
@DomKennedy

Description

@DomKennedy
import pandas as pd

df = pd.DataFrame(
    {
        "foo": [pd.Timestamp("2019"), pd.Timestamp("2020")],
        "bar": [pd.Timestamp("2018"), pd.Timestamp("2021")],
    }
)

df2 = df[["foo"]]

print(df - df2)

Problem description

The above snippet raises the following exception:

Traceback (most recent call last):
  File ".venv/lib/python3.6/site-packages/pandas/core/ops/array_ops.py", line 149, in na_arithmetic_op
    result = expressions.evaluate(op, str_rep, left, right)
  File ".v
env/lib/python3.6/site-packages/pandas/core/computation/expressions.py", line 208, in evaluate
    return _evaluate(op, op_str, a, b)
  File ".venv/lib/python3.6/site-packages/pandas/core/computation/expressions.py", line 70, in _evaluate_standard
    return op(a, b)
  File ".venv/lib/python3.6/site-packages/pandas/core/ops/common.py", line 64, in new_method
    return method(self, other)
  File ".venv/lib/python3.6/site-packages/pandas/core/ops/__init__.py", line 500, in wrapper
    result = arithmetic_op(lvalues, rvalues, op, str_rep)
  File ".venv/lib/python3.6/site-packages/pandas/core/ops/array_ops.py", line 192, in arithmetic_op
    res_values = dispatch_to_extension_op(op, lvalues, rvalues)
  File ".venv/lib/python3.6/site-packages/pandas/core/ops/dispatch.py", line 125, in dispatch_to_extension_op
    res_values = op(left, right)
  File ".venv/lib/python3.6/site-packages/pandas/core/arrays/datetimelike.py", line 1390, in __rsub__
    f"cannot subtract {type(self).__name__} from {type(other).__name__}"
TypeError: cannot subtract DatetimeArray from ndarray

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pandas_bug.py", line 36, in <module>
    print(df2 - df)
  File ".venv/lib/python3.6/site-packages/pandas/core/ops/__init__.py", line 703, in f
    new_data = left._combine_frame(right, pass_op, fill_value)
  File ".venv/lib/python3.6/site-packages/pandas/core/frame.py", line 5297, in _combine_frame
    new_data = ops.dispatch_to_series(self, other, _arith_op)
  File ".venv/lib/python3.6/site-packages/pandas/core/ops/__init__.py", line 416, in dispatch_to_series
    new_data = expressions.evaluate(column_op, str_rep, left, right)
  File ".venv/lib/python3.6/site-packages/pandas/core/computation/expressions.py", line 208, in evaluate
    return _evaluate(op, op_str, a, b)
  File ".venv/lib/python3.6/site-packages/pandas/core/computation/expressions.py", line 70, in _evaluate_standard
    return op(a, b)
  File ".venv/lib/python3.6/site-packages/pandas/core/ops/__init__.py", line 385, in column_op
    return {i: func(a.iloc[:, i], b.iloc[:, i]) for i in range(len(a.columns))}
  File ".venv/lib/python3.6/site-packages/pandas/core/ops/__init__.py", line 385, in <dictcomp>
    return {i: func(a.iloc[:, i], b.iloc[:, i]) for i in range(len(a.columns))}
  File ".venv/lib/python3.6/site-packages/pandas/core/ops/array_ops.py", line 121, in na_op
    return na_arithmetic_op(x, y, op, str_rep)
  File ".venv/lib/python3.6/site-packages/pandas/core/ops/array_ops.py", line 151, in na_arithmetic_op
    result = masked_arith_op(left, right, op)
  File ".venv/lib/python3.6/site-packages/pandas/core/ops/array_ops.py", line 75, in masked_arith_op
    assert isinstance(x, np.ndarray), type(x)

This is a 1.0.0 regression; in 0.25.3, the operation succeeds and the unmatched bar column is filled with NaN in the output.

The same error occurs with:

  • Any combination of incompatible columns (strict subset, strict superset, overlapping, disjoint)
  • Calling the subtract method instead of using the subtraction operator
  • Timezone-aware Timestamps as well as timezone-naive

It does not seem to occur with:

  • Mismatches on the row index; transposing the dataframes in the above example prevents the errors occuring.
  • pd.Series objects with mismatched indexes (e.g. calling the above on the first row of each dataframe works fine)
  • Other dtypes; bool, float, and int seem to work fine. Similarly, if the dataframes are explicitly cast to dtype object, the operation succeeds.

Expected Output

   bar    foo
0  NaN 0 days
1  NaN 0 days

Output of pd.show_versions()

``` INSTALLED VERSIONS ------------------ commit : None python : 3.6.8.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-74-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8

pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 41.6.0
Cython : None
pytest : 5.3.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None


</details>

Metadata

Metadata

Assignees

No one assigned

    Labels

    Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNumeric OperationsArithmetic, Comparison, and Logical operationsRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions