Skip to content

[Python] Test failures on 32-bit x86 #40153

@mgorny

Description

@mgorny

Describe the bug, including details regarding any error messages, version, and platform.

When running the test suite on 32-bit x86, I'm getting the following test failures:

FAILED tests/test_array.py::test_dictionary_to_numpy - TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'
FAILED tests/test_io.py::test_python_file_large_seeks - assert 5 == ((2 ** 32) + 5)
FAILED tests/test_io.py::test_memory_map_large_seeks - OSError: Read out of bounds (offset = 4294967301, size = 5) in file of size 10
FAILED tests/test_pandas.py::TestConvertStructTypes::test_from_numpy_nested - AssertionError: assert 8 == 12
FAILED tests/test_schema.py::test_schema_sizeof - assert 28 > 30
FAILED tests/interchange/test_conversion.py::test_pandas_roundtrip_string - OverflowError: Python int too large to convert to C ssize_t
FAILED tests/interchange/test_conversion.py::test_pandas_roundtrip_large_string - OverflowError: Python int too large to convert to C ssize_t
FAILED tests/interchange/test_conversion.py::test_pandas_roundtrip_string_with_missing - OverflowError: Python int too large to convert to C ssize_t
FAILED tests/interchange/test_conversion.py::test_pandas_roundtrip_categorical - OverflowError: Python int too large to convert to C ssize_t
FAILED tests/interchange/test_conversion.py::test_empty_dataframe - OverflowError: Python int too large to convert to C ssize_t
Tracebacks
============================================================== FAILURES ===============================================================
______________________________________________________ test_dictionary_to_numpy _______________________________________________________

obj = array([13.7, 11. ]), method = 'take', args = (array([0, 1, 1, 0], dtype=int64),)
kwds = {'axis': None, 'mode': 'raise', 'out': None}, bound = <built-in method take of numpy.ndarray object at 0xeaca6ad0>

    def _wrapfunc(obj, method, *args, **kwds):
        bound = getattr(obj, method, None)
        if bound is None:
            return _wrapit(obj, method, *args, **kwds)
    
        try:
>           return bound(*args, **kwds)
E           TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'

args       = (array([0, 1, 1, 0], dtype=int64),)
bound      = <built-in method take of numpy.ndarray object at 0xeaca6ad0>
kwds       = {'axis': None, 'mode': 'raise', 'out': None}
method     = 'take'
obj        = array([13.7, 11. ])

/usr/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: TypeError

During handling of the above exception, another exception occurred:

    def test_dictionary_to_numpy():
        expected = pa.array(
            ["foo", "bar", None, "foo"]
        ).to_numpy(zero_copy_only=False)
        a = pa.DictionaryArray.from_arrays(
            pa.array([0, 1, None, 0]),
            pa.array(['foo', 'bar'])
        )
        np.testing.assert_array_equal(a.to_numpy(zero_copy_only=False),
                                      expected)
    
        with pytest.raises(pa.ArrowInvalid):
            # If this would be changed to no longer raise in the future,
            # ensure to test the actual result because, currently, to_numpy takes
            # for granted that when zero_copy_only=True there will be no nulls
            # (it's the decoding of the DictionaryArray that handles the nulls and
            # this is only activated with zero_copy_only=False)
            a.to_numpy(zero_copy_only=True)
    
        anonulls = pa.DictionaryArray.from_arrays(
            pa.array([0, 1, 1, 0]),
            pa.array(['foo', 'bar'])
        )
        expected = pa.array(
            ["foo", "bar", "bar", "foo"]
        ).to_numpy(zero_copy_only=False)
        np.testing.assert_array_equal(anonulls.to_numpy(zero_copy_only=False),
                                      expected)
    
        with pytest.raises(pa.ArrowInvalid):
            anonulls.to_numpy(zero_copy_only=True)
    
        afloat = pa.DictionaryArray.from_arrays(
            pa.array([0, 1, 1, 0]),
            pa.array([13.7, 11.0])
        )
        expected = pa.array([13.7, 11.0, 11.0, 13.7]).to_numpy()
>       np.testing.assert_array_equal(afloat.to_numpy(zero_copy_only=True),
                                      expected)

a          = <pyarrow.lib.DictionaryArray object at 0xeafe6ed0>

-- dictionary:
  [
    "foo",
    "bar"
  ]
-- indices:
  [
    0,
    1,
    null,
    0
  ]
afloat     = <pyarrow.lib.DictionaryArray object at 0xeafe6fb0>

-- dictionary:
  [
    13.7,
    11
  ]
-- indices:
  [
    0,
    1,
    1,
    0
  ]
anonulls   = <pyarrow.lib.DictionaryArray object at 0xeafe6e60>

-- dictionary:
  [
    "foo",
    "bar"
  ]
-- indices:
  [
    0,
    1,
    1,
    0
  ]
expected   = array([13.7, 11. , 11. , 13.7])

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/test_array.py:823: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pyarrow/array.pxi:1590: in pyarrow.lib.Array.to_numpy
    ???
/usr/lib/python3.11/site-packages/numpy/core/fromnumeric.py:192: in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
        a          = array([13.7, 11. ])
        axis       = None
        indices    = array([0, 1, 1, 0], dtype=int64)
        mode       = 'raise'
        out        = None
/usr/lib/python3.11/site-packages/numpy/core/fromnumeric.py:68: in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
        args       = (array([0, 1, 1, 0], dtype=int64),)
        bound      = <built-in method take of numpy.ndarray object at 0xeaca6ad0>
        kwds       = {'axis': None, 'mode': 'raise', 'out': None}
        method     = 'take'
        obj        = array([13.7, 11. ])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

obj = array([13.7, 11. ]), method = 'take', args = (array([0, 1, 1, 0], dtype=int64),)
kwds = {'axis': None, 'mode': 'raise', 'out': None}, wrap = <built-in method __array_wrap__ of numpy.ndarray object at 0xeaca6ad0>

    def _wrapit(obj, method, *args, **kwds):
        try:
            wrap = obj.__array_wrap__
        except AttributeError:
            wrap = None
>       result = getattr(asarray(obj), method)(*args, **kwds)
E       TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'

args       = (array([0, 1, 1, 0], dtype=int64),)
kwds       = {'axis': None, 'mode': 'raise', 'out': None}
method     = 'take'
obj        = array([13.7, 11. ])
wrap       = <built-in method __array_wrap__ of numpy.ndarray object at 0xeaca6ad0>

/usr/lib/python3.11/site-packages/numpy/core/fromnumeric.py:45: TypeError
____________________________________________________ test_python_file_large_seeks _____________________________________________________

    def test_python_file_large_seeks():
        def factory(filename):
            return pa.PythonFile(open(filename, 'rb'))
    
>       check_large_seeks(factory)

factory    = <function test_python_file_large_seeks.<locals>.factory at 0xe13b6de8>

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/test_io.py:262: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

file_factory = <function test_python_file_large_seeks.<locals>.factory at 0xe13b6de8>

    def check_large_seeks(file_factory):
        if sys.platform in ('win32', 'darwin'):
            pytest.skip("need sparse file support")
        try:
            filename = tempfile.mktemp(prefix='test_io')
            with open(filename, 'wb') as f:
                f.truncate(2 ** 32 + 10)
                f.seek(2 ** 32 + 5)
                f.write(b'mark\n')
            with file_factory(filename) as f:
>               assert f.seek(2 ** 32 + 5) == 2 ** 32 + 5
E               assert 5 == ((2 ** 32) + 5)
E                +  where 5 = <bound method NativeFile.seek of <pyarrow.PythonFile closed=False own_file=False is_seekable=True is_writable=False is_readable=True>>(((2 ** 32) + 5))
E                +    where <bound method NativeFile.seek of <pyarrow.PythonFile closed=False own_file=False is_seekable=True is_writable=False is_readable=True>> = <pyarrow.PythonFile closed=False own_file=False is_seekable=True is_writable=False is_readable=True>.seek

f          = <pyarrow.PythonFile closed=True own_file=False is_seekable=True is_writable=False is_readable=True>
file_factory = <function test_python_file_large_seeks.<locals>.factory at 0xe13b6de8>
filename   = '/var/tmp/portage/dev-python/pyarrow-15.0.0/temp/test_ioj_p6zuld'

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/test_io.py:49: AssertionError
_____________________________________________________ test_memory_map_large_seeks _____________________________________________________

    def test_memory_map_large_seeks():
>       check_large_seeks(pa.memory_map)


../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/test_io.py:1140: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/test_io.py:51: in check_large_seeks
    assert f.read(5) == b'mark\n'
        f          = <pyarrow.MemoryMappedFile closed=True own_file=False is_seekable=True is_writable=False is_readable=True>
        file_factory = <cyfunction memory_map at 0xf228e778>
        filename   = '/var/tmp/portage/dev-python/pyarrow-15.0.0/temp/test_iozl2wxbou'
pyarrow/io.pxi:409: in pyarrow.lib.NativeFile.read
    ???
pyarrow/error.pxi:154: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OSError: Read out of bounds (offset = 4294967301, size = 5) in file of size 10


pyarrow/error.pxi:91: OSError
____________________________________________ TestConvertStructTypes.test_from_numpy_nested ____________________________________________

self = <pyarrow.tests.test_pandas.TestConvertStructTypes object at 0xeb535d90>

    def test_from_numpy_nested(self):
        # Note: an object field inside a struct
        dt = np.dtype([('x', np.dtype([('xx', np.int8),
                                       ('yy', np.bool_)])),
                       ('y', np.int16),
                       ('z', np.object_)])
        # Note: itemsize is not a multiple of sizeof(object)
>       assert dt.itemsize == 12
E       AssertionError: assert 8 == 12
E        +  where 8 = dtype([('x', [('xx', 'i1'), ('yy', '?')]), ('y', '<i2'), ('z', 'O')]).itemsize

dt         = dtype([('x', [('xx', 'i1'), ('yy', '?')]), ('y', '<i2'), ('z', 'O')])
self       = <pyarrow.tests.test_pandas.TestConvertStructTypes object at 0xeb535d90>

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/test_pandas.py:2604: AssertionError
_________________________________________________________ test_schema_sizeof __________________________________________________________

    def test_schema_sizeof():
        schema = pa.schema([
            pa.field('foo', pa.int32()),
            pa.field('bar', pa.string()),
        ])
    
>       assert sys.getsizeof(schema) > 30
E       assert 28 > 30
E        +  where 28 = <built-in function getsizeof>(foo: int32\nbar: string)
E        +    where <built-in function getsizeof> = sys.getsizeof

schema     = foo: int32
bar: string

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/test_schema.py:684: AssertionError
____________________________________________________ test_pandas_roundtrip_string _____________________________________________________

    @pytest.mark.pandas
    def test_pandas_roundtrip_string():
        # See https://github.com/pandas-dev/pandas/issues/50554
        if Version(pd.__version__) < Version("1.6"):
            pytest.skip("Column.size() bug in pandas")
    
        arr = ["a", "", "c"]
        table = pa.table({"a": pa.array(arr)})
    
        from pandas.api.interchange import (
            from_dataframe as pandas_from_dataframe
        )
    
        pandas_df = pandas_from_dataframe(table)
>       result = pi.from_dataframe(pandas_df)

arr        = ['a', '', 'c']
pandas_df  =    a
0  a
1   
2  c
pandas_from_dataframe = <function from_dataframe at 0xebbaa398>
table      = pyarrow.Table
a: string
----
a: [["a","","c"]]

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/interchange/test_conversion.py:159: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:113: in from_dataframe
    return _from_dataframe(df.__dataframe__(allow_copy=allow_copy),
        allow_copy = True
        df         =    a
0  a
1   
2  c
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:136: in _from_dataframe
    batch = protocol_df_chunk_to_pyarrow(chunk, allow_copy)
        allow_copy = True
        batches    = []
        chunk      = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda1c61f0>
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda1c61f0>
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:182: in protocol_df_chunk_to_pyarrow
    columns[name] = column_to_array(col, allow_copy)
        allow_copy = True
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda1c65b0>
        columns    = {}
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda1c61f0>
        dtype      = <DtypeKind.STRING: 21>
        name       = 'a'
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:214: in column_to_array
    data = buffers_to_array(buffers, data_type,
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 2, 'ptr': 3879523528, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 32, 'ptr': 1530035680, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 3, 'ptr': 1529980112, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda1c65b0>
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:396: in buffers_to_array
    data_pa_buffer = pa.foreign_buffer(data_buff.ptr, data_buff.bufsize,
        _          = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 2, 'ptr': 3879523528, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 32, 'ptr': 1530035680, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 3, 'ptr': 1529980112, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        data_buff  = PandasBuffer({'bufsize': 2, 'ptr': 3879523528, 'device': 'CPU'})
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        describe_null = (<ColumnNullType.USE_BYTEMASK: 4>, 0)
        length     = 3
        offset     = 0
        offset_buff = PandasBuffer({'bufsize': 32, 'ptr': 1530035680, 'device': 'CPU'})
        offset_dtype = (<DtypeKind.INT: 0>, 64, 'l', '=')
        validity_buff = PandasBuffer({'bufsize': 3, 'ptr': 1529980112, 'device': 'CPU'})
        validity_dtype = (<DtypeKind.BOOL: 20>, 8, 'b', '=')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OverflowError: Python int too large to convert to C ssize_t


pyarrow/io.pxi:1990: OverflowError
_________________________________________________ test_pandas_roundtrip_large_string __________________________________________________

    @pytest.mark.pandas
    def test_pandas_roundtrip_large_string():
        # See https://github.com/pandas-dev/pandas/issues/50554
        if Version(pd.__version__) < Version("1.6"):
            pytest.skip("Column.size() bug in pandas")
    
        arr = ["a", "", "c"]
        table = pa.table({"a_large": pa.array(arr, type=pa.large_string())})
    
        from pandas.api.interchange import (
            from_dataframe as pandas_from_dataframe
        )
    
        if Version(pd.__version__) >= Version("2.0.1"):
            pandas_df = pandas_from_dataframe(table)
>           result = pi.from_dataframe(pandas_df)

arr        = ['a', '', 'c']
pandas_df  =   a_large
0       a
1        
2       c
pandas_from_dataframe = <function from_dataframe at 0xebbaa398>
table      = pyarrow.Table
a_large: large_string
----
a_large: [["a","","c"]]

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/interchange/test_conversion.py:189: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:113: in from_dataframe
    return _from_dataframe(df.__dataframe__(allow_copy=allow_copy),
        allow_copy = True
        df         =   a_large
0       a
1        
2       c
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:136: in _from_dataframe
    batch = protocol_df_chunk_to_pyarrow(chunk, allow_copy)
        allow_copy = True
        batches    = []
        chunk      = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda103a10>
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda103a10>
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:182: in protocol_df_chunk_to_pyarrow
    columns[name] = column_to_array(col, allow_copy)
        allow_copy = True
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda1033d0>
        columns    = {}
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda103a10>
        dtype      = <DtypeKind.STRING: 21>
        name       = 'a_large'
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:214: in column_to_array
    data = buffers_to_array(buffers, data_type,
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 2, 'ptr': 3879522800, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 32, 'ptr': 1480303312, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 3, 'ptr': 1478277616, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda1033d0>
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:396: in buffers_to_array
    data_pa_buffer = pa.foreign_buffer(data_buff.ptr, data_buff.bufsize,
        _          = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 2, 'ptr': 3879522800, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 32, 'ptr': 1480303312, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 3, 'ptr': 1478277616, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        data_buff  = PandasBuffer({'bufsize': 2, 'ptr': 3879522800, 'device': 'CPU'})
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        describe_null = (<ColumnNullType.USE_BYTEMASK: 4>, 0)
        length     = 3
        offset     = 0
        offset_buff = PandasBuffer({'bufsize': 32, 'ptr': 1480303312, 'device': 'CPU'})
        offset_dtype = (<DtypeKind.INT: 0>, 64, 'l', '=')
        validity_buff = PandasBuffer({'bufsize': 3, 'ptr': 1478277616, 'device': 'CPU'})
        validity_dtype = (<DtypeKind.BOOL: 20>, 8, 'b', '=')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OverflowError: Python int too large to convert to C ssize_t


pyarrow/io.pxi:1990: OverflowError
______________________________________________ test_pandas_roundtrip_string_with_missing ______________________________________________

    @pytest.mark.pandas
    def test_pandas_roundtrip_string_with_missing():
        # See https://github.com/pandas-dev/pandas/issues/50554
        if Version(pd.__version__) < Version("1.6"):
            pytest.skip("Column.size() bug in pandas")
    
        arr = ["a", "", "c", None]
        table = pa.table({"a": pa.array(arr),
                          "a_large": pa.array(arr, type=pa.large_string())})
    
        from pandas.api.interchange import (
            from_dataframe as pandas_from_dataframe
        )
    
        if Version(pd.__version__) >= Version("2.0.2"):
            pandas_df = pandas_from_dataframe(table)
>           result = pi.from_dataframe(pandas_df)

arr        = ['a', '', 'c', None]
pandas_df  =      a a_large
0    a       a
1             
2    c       c
3  NaN     NaN
pandas_from_dataframe = <function from_dataframe at 0xebbaa398>
table      = pyarrow.Table
a: string
a_large: large_string
----
a: [["a","","c",null]]
a_large: [["a","","c",null]]

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/interchange/test_conversion.py:227: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:113: in from_dataframe
    return _from_dataframe(df.__dataframe__(allow_copy=allow_copy),
        allow_copy = True
        df         =      a a_large
0    a       a
1             
2    c       c
3  NaN     NaN
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:136: in _from_dataframe
    batch = protocol_df_chunk_to_pyarrow(chunk, allow_copy)
        allow_copy = True
        batches    = []
        chunk      = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda15b850>
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda15b850>
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:182: in protocol_df_chunk_to_pyarrow
    columns[name] = column_to_array(col, allow_copy)
        allow_copy = True
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda103210>
        columns    = {}
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda15b850>
        dtype      = <DtypeKind.STRING: 21>
        name       = 'a'
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:214: in column_to_array
    data = buffers_to_array(buffers, data_type,
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 2, 'ptr': 3879523744, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 40, 'ptr': 1469510752, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 4, 'ptr': 1475420176, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda103210>
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:396: in buffers_to_array
    data_pa_buffer = pa.foreign_buffer(data_buff.ptr, data_buff.bufsize,
        _          = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 2, 'ptr': 3879523744, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 40, 'ptr': 1469510752, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 4, 'ptr': 1475420176, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        data_buff  = PandasBuffer({'bufsize': 2, 'ptr': 3879523744, 'device': 'CPU'})
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        describe_null = (<ColumnNullType.USE_BYTEMASK: 4>, 0)
        length     = 4
        offset     = 0
        offset_buff = PandasBuffer({'bufsize': 40, 'ptr': 1469510752, 'device': 'CPU'})
        offset_dtype = (<DtypeKind.INT: 0>, 64, 'l', '=')
        validity_buff = PandasBuffer({'bufsize': 4, 'ptr': 1475420176, 'device': 'CPU'})
        validity_dtype = (<DtypeKind.BOOL: 20>, 8, 'b', '=')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OverflowError: Python int too large to convert to C ssize_t


pyarrow/io.pxi:1990: OverflowError
__________________________________________________ test_pandas_roundtrip_categorical __________________________________________________

    @pytest.mark.pandas
    def test_pandas_roundtrip_categorical():
        if Version(pd.__version__) < Version("2.0.2"):
            pytest.skip("Bitmasks not supported in pandas interchange implementation")
    
        arr = ["Mon", "Tue", "Mon", "Wed", "Mon", "Thu", "Fri", "Sat", None]
        table = pa.table(
            {"weekday": pa.array(arr).dictionary_encode()}
        )
    
        from pandas.api.interchange import (
            from_dataframe as pandas_from_dataframe
        )
        pandas_df = pandas_from_dataframe(table)
>       result = pi.from_dataframe(pandas_df)

arr        = ['Mon', 'Tue', 'Mon', 'Wed', 'Mon', 'Thu', 'Fri', 'Sat', None]
pandas_df  =   weekday
0     Mon
1     Tue
2     Mon
3     Wed
4     Mon
5     Thu
6     Fri
7     Sat
8     NaN
pandas_from_dataframe = <function from_dataframe at 0xebbaa398>
table      = pyarrow.Table
weekday: dictionary<values=string, indices=int32, ordered=0>
----
weekday: [  -- dictionary:
["Mon","Tue","Wed","Thu","Fri","Sat"]  -- indices:
[0,1,0,2,0,3,4,5,null]]

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/interchange/test_conversion.py:257: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:113: in from_dataframe
    return _from_dataframe(df.__dataframe__(allow_copy=allow_copy),
        allow_copy = True
        df         =   weekday
0     Mon
1     Tue
2     Mon
3     Wed
4     Mon
5     Thu
6     Fri
7     Sat
8     NaN
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:136: in _from_dataframe
    batch = protocol_df_chunk_to_pyarrow(chunk, allow_copy)
        allow_copy = True
        batches    = []
        chunk      = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xd9e217f0>
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xd9e217f0>
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:186: in protocol_df_chunk_to_pyarrow
    columns[name] = categorical_column_to_dictionary(col, allow_copy)
        allow_copy = True
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda180550>
        columns    = {}
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xd9e217f0>
        dtype      = <DtypeKind.CATEGORICAL: 23>
        name       = 'weekday'
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:293: in categorical_column_to_dictionary
    dictionary = column_to_array(cat_column)
        allow_copy = True
        cat_column = <pandas.core.interchange.column.PandasColumn object at 0xda1801d0>
        categorical = {'categories': <pandas.core.interchange.column.PandasColumn object at 0xda1801d0>,
 'is_dictionary': True,
 'is_ordered': False}
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda180550>
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:214: in column_to_array
    data = buffers_to_array(buffers, data_type,
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 18, 'ptr': 3659006432, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 56, 'ptr': 1466456352, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 6, 'ptr': 1477427216, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda1801d0>
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:396: in buffers_to_array
    data_pa_buffer = pa.foreign_buffer(data_buff.ptr, data_buff.bufsize,
        _          = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 18, 'ptr': 3659006432, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 56, 'ptr': 1466456352, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 6, 'ptr': 1477427216, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        data_buff  = PandasBuffer({'bufsize': 18, 'ptr': 3659006432, 'device': 'CPU'})
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        describe_null = (<ColumnNullType.USE_BYTEMASK: 4>, 0)
        length     = 6
        offset     = 0
        offset_buff = PandasBuffer({'bufsize': 56, 'ptr': 1466456352, 'device': 'CPU'})
        offset_dtype = (<DtypeKind.INT: 0>, 64, 'l', '=')
        validity_buff = PandasBuffer({'bufsize': 6, 'ptr': 1477427216, 'device': 'CPU'})
        validity_dtype = (<DtypeKind.BOOL: 20>, 8, 'b', '=')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OverflowError: Python int too large to convert to C ssize_t


pyarrow/io.pxi:1990: OverflowError
________________________________________________________ test_empty_dataframe _________________________________________________________

    def test_empty_dataframe():
        schema = pa.schema([('col1', pa.int8())])
        df = pa.table([[]], schema=schema)
        dfi = df.__dataframe__()
>       assert pi.from_dataframe(dfi) == df

df         = pyarrow.Table
col1: int8
----
col1: [[]]
dfi        = <pyarrow.interchange.dataframe._PyArrowDataFrame object at 0xd98381d0>
schema     = col1: int8

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/interchange/test_conversion.py:522: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:113: in from_dataframe
    return _from_dataframe(df.__dataframe__(allow_copy=allow_copy),
        allow_copy = True
        df         = <pyarrow.interchange.dataframe._PyArrowDataFrame object at 0xd98381d0>
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:140: in _from_dataframe
    batch = protocol_df_chunk_to_pyarrow(df)
        allow_copy = True
        batches    = []
        df         = <pyarrow.interchange.dataframe._PyArrowDataFrame object at 0xd96e41b0>
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:182: in protocol_df_chunk_to_pyarrow
    columns[name] = column_to_array(col, allow_copy)
        allow_copy = True
        col        = <pyarrow.interchange.column._PyArrowColumn object at 0xd96a6650>
        columns    = {}
        df         = <pyarrow.interchange.dataframe._PyArrowDataFrame object at 0xd96e41b0>
        dtype      = <DtypeKind.INT: 0>
        name       = 'col1'
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:214: in column_to_array
    data = buffers_to_array(buffers, data_type,
        allow_copy = True
        buffers    = {'data': (PyArrowBuffer({'bufsize': 0, 'ptr': 4122363392, 'device': 'CPU'}),
          (<DtypeKind.INT: 0>, 8, 'c', '=')),
 'offsets': None,
 'validity': None}
        col        = <pyarrow.interchange.column._PyArrowColumn object at 0xd96a6650>
        data_type  = (<DtypeKind.INT: 0>, 8, 'c', '=')
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:396: in buffers_to_array
    data_pa_buffer = pa.foreign_buffer(data_buff.ptr, data_buff.bufsize,
        _          = (<DtypeKind.INT: 0>, 8, 'c', '=')
        allow_copy = True
        buffers    = {'data': (PyArrowBuffer({'bufsize': 0, 'ptr': 4122363392, 'device': 'CPU'}),
          (<DtypeKind.INT: 0>, 8, 'c', '=')),
 'offsets': None,
 'validity': None}
        data_buff  = PyArrowBuffer({'bufsize': 0, 'ptr': 4122363392, 'device': 'CPU'})
        data_type  = (<DtypeKind.INT: 0>, 8, 'c', '=')
        describe_null = (<ColumnNullType.NON_NULLABLE: 0>, None)
        length     = 0
        offset     = 0
        offset_buff = None
        validity_buff = None
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OverflowError: Python int too large to convert to C ssize_t


pyarrow/io.pxi:1990: OverflowError

Full build & test log (2.5M): pyarrow.txt

This is arrow 15.0.0 on Gentoo, with x86 systemd-nspawn container. I've used -O2 -march=pentium-m -mfpmath=sse -pipe as compiler flags, to rule out i387-specific issues.

>>> pyarrow.show_info()
pyarrow version info
--------------------
Package kind              : not indicated
Arrow C++ library version : 15.0.0  
Arrow C++ compiler        : GNU 13.2.1
Arrow C++ compiler flags  : -O2 -march=pentium-m -mfpmath=sse -pipe
Arrow C++ git revision    :         
Arrow C++ git description :         
Arrow C++ build type      : relwithdebinfo

Platform:
  OS / Arch           : Linux x86_64
  SIMD Level          : avx2    
  Detected SIMD Level : avx2    

Memory:
  Default backend     : system  
  Bytes allocated     : 0 bytes 
  Max memory          : 0 bytes 
  Supported Backends  : system  

Optional modules:
  csv                 : Enabled 
  cuda                : -       
  dataset             : Enabled 
  feather             : Enabled 
  flight              : -       
  fs                  : Enabled 
  gandiva             : -       
  json                : Enabled 
  orc                 : -       
  parquet             : Enabled 

Filesystems:
  GcsFileSystem       : -       
  HadoopFileSystem    : Enabled 
  S3FileSystem        : -       

Compression Codecs:
  brotli              : Enabled 
  bz2                 : Enabled 
  gzip                : Enabled 
  lz4_frame           : Enabled 
  lz4                 : Enabled 
  snappy              : Enabled 
  zstd                : Enabled 

Some of these might be problems inside pandas. I'm going to file a bug about the test failures there in a minute, and link it here afterwards.

Component(s)

Python

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions