Skip to content

[Python] Segfault when reading CSV inside Flight server #28374

@asfimport

Description

@asfimport

Using pyarrow.csv.read_csv inside a Flight server results in a segfault. This did not happen in pyarrow 3.0.0.

The CI build of a library we're building failed and made us aware of the issue.

Attached, a CSV and Python server/client can be found that demonstrates the problem.

  • Run the server with python crash.py server.

  • Run the client with python crash.py client. The server segfaults with 'Segmentation fault (core dumped)'.

    The crash does not happen when just reading the CSV (python crash.py).

    This is the stacktrace generated by coredumpctl debug of a debug build of commit 2746266:
    {code:java}
    #0  0x00007f9275cffedc in __gnu_cxx::__atomic_add (__val=1, __mem=0x10) at /usr/include/c++/10.2.0/ext/atomicity.h:55

#1  __gnu_cxx::__atomic_add_dispatch (__val=1, __mem=0x10) at /usr/include/c++/10.2.0/ext/atomicity.h:96

#2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy (this=0x8)

   at /usr/include/c++/10.2.0/bits/shared_ptr_base.h:142

#3  0x00007f9275cfe0a5 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count (this=0x7f92735a2778,  
   __r=...) at /usr/include/c++/10.2.0/bits/shared_ptr_base.h:740

#4  0x00007f9275cfd01f in std::__shared_ptr<arrow::StopSourceImpl, (__gnu_cxx::_Lock_policy)2>::__shared_ptr (

   this=0x7f92735a2770) at /usr/include/c++/10.2.0/bits/shared_ptr_base.h:1181

#5  0x00007f9275cfd045 in std::shared_ptrarrow::StopSourceImpl::shared_ptr (this=0x7f92735a2770)

   at /usr/include/c++/10.2.0/bits/shared_ptr.h:149

#6  0x00007f9275cfd06b in arrow::StopToken::StopToken (this=0x7f92735a2770)

   at /home/jeroen/dev/python/apache-arrow/dist/include/arrow/util/cancel.h:57

#7  0x00007f9275ce96f7 in __pyx_pf_7pyarrow_4_csv_read_csv (__pyx_self=0x0, __pyx_v_input_file=0x7f929e9f28b0,  
   __pyx_v_read_options=0x7f929f49ee80 <_Py_NoneStruct>, __pyx_v_parse_options=0x7f929f49ee80 <_Py_NoneStruct>,  
   __pyx_v_convert_options=0x7f929f49ee80 <_Py_NoneStruct>, __pyx_v_memory_pool=0x7f929f49ee80 <_Py_NoneStruct>)

   at /home/jeroen/dev/python/apache-arrow/arrow/python/build/temp.linux-x86_64-3.8/_csv.cpp:14208

#8  0x00007f9275ce8b92 in __pyx_pw_7pyarrow_4_csv_1read_csv (__pyx_self=0x0, __pyx_args=0x7f929ea64be0, __pyx_kwds=0x0)

   at /home/jeroen/dev/python/apache-arrow/arrow/python/build/temp.linux-x86_64-3.8/_csv.cpp:14036

#9  0x00007f929f22cf98 in ?? () from /usr/lib/libpython3.8.so.1.0

#10 0x00007f929f22d5f8 in _PyObject_MakeTpCall () from /usr/lib/libpython3.8.so.1.0

Based on my limited understanding of the code, it looks like the error is here:
[https://github.com/apache/arrow/blob/master/python/pyarrow/_csv.pyx#L799]
{code:java}
    with SignalStopHandler() as stop_handler:
                io_context = CIOContext(
                    maybe_unbox_memory_pool(memory_pool),
                    (<StopToken> stop_handler.stop_token).stop_token)

Where stop_token is null, because the SignalStopHandler had an empty list of signals on creation (https://github.com/apache/arrow/blob/master/python/pyarrow/error.pxi#L191).

        if (signal_handlers_enabled and
                threading.current_thread() is threading.main_thread()):
            self._signals = [
                sig for sig in (signal.SIGINT, signal.SIGTERM)
                if signal.getsignal(sig) not in (signal.SIG_DFL,
                                                 signal.SIG_IGN, None)]
        if not self._signals.empty():
            self._stop_token = StopToken()
            self._stop_token.init(GetResultValue(
                SetSignalStopSource()).token())
            self._enabled = True

Environment: Arch Linux 5.11.16-arch1-1
Originally found on GitHub Actions Ubuntu 20.04.2
Python 3.8 and Python 3.9
Reporter: Jeroen Hoekx
Assignee: David Li / @lidavidm

Original Issue Attachments:

PRs and other links:

Note: This issue was originally created as ARROW-12622. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions