Skip to content

Wrong result for equality evaluation with numpy 2.3.0 #515

Open
@jorisvandenbossche

Description

@jorisvandenbossche

I am trying to investigate a regression in pandas test suite involving numexpr evaluation, where we have some tests that seem to have started failing with the recent release of numpy 2.3.0.

I have been able to trim it down to the following reproducible example (unfortunately not yet entirely eliminating the pandas usage, but the final code giving the wrong result only involves numpy and numexpr, pandas is only used for creating the test data):

import numpy as np
import numexpr as ne

# creating the test data through pandas
from pandas import DataFrame
arr = np.random.default_rng(2).integers(1, 100, size=(10001, 4)) 
df = DataFrame(arr, columns=list("ABCD"))
other = df.copy() + 1
# extracting the numpy arrays
a = df._mgr.blocks[0].values
b = other._mgr.blocks[0].values

# trying to create the data with just numpy -> not yet reproducing it
# arr = np.random.default_rng(2).integers(1, 100, size=(10001, 4))
# a = arr.T
# b = a.copy() + 1

# equality using numpy
expected = a == b

# equality using numexpr
result = ne.evaluate("b == a", casting="safe")

print(f"numpy: {np.__version__}")
print(f"numexpr: {ne.__version__}")
# given "b = a + 1", we expect all False, i.e. a sum of 0 
print(f"numpy eq: {expected.sum()}")
print(f"numexpr eq: {result.sum()}")

I can consistently reproduce wrong output with numpy 2.3.0, and correct results with previous numpy 2.2. In both cases using numexpr 2.10, so not the just released version.

$ mamba create -n test-np22 python=3.11 numpy=2.2 pandas=2.2 numexpr=2.10
$ mamba create -n test-np23 python=3.11 numpy=2.3 pandas=2.2 numexpr=2.10
$ mamba run -n test-py311-np22 python test_numexpr_eq_bug.py 
numpy: 2.2.6
numexpr: 2.10.2
numpy eq: 0
numexpr eq: 0
$ mamba run -n test-py311-np23 python test_numexpr_eq_bug.py 
numpy: 2.3.0
numexpr: 2.10.2
numpy eq: 0
numexpr eq: 51   # <--- the equality is giving True for some values

The arrays a and b have a different order, so was thinking that might trigger the issue. But when trying to recreate test data directly using numpy, I can't reproduce the issue. I will try to further look into what pandas exactly does with the arrays while creating the dataframes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions