You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ARROW-4582: [Python/C++] Acquire the GIL on Py_INCREF
Minimal reproducing example:
```
import dask
import pandas as pd
import pyarrow as pa
import numpy as np
def segfault_me(df):
pa.Table.from_pandas(df, nthreads=1)
while True:
df = pd.DataFrame(
{"P": np.arange(0, 10), "L": np.arange(0, 10), "TARGET": np.arange(10, 20)}
)
dask.compute([
dask.delayed(segfault_me)(df),
dask.delayed(segfault_me)(df),
dask.delayed(segfault_me)(df),
dask.delayed(segfault_me)(df),
dask.delayed(segfault_me)(df),
])
```
Segfaults are more likely when run in AddressSanitizer or otherwise slow system with many cores. It is important that always the same df is passed into the functions.
The issue was that the reference count of the underlying NumPy array was increased at the same time by multiple threads. The decrease happend then with a GIL, so the array was sometimes destroyed while still used.
Author: Korn, Uwe <Uwe.Korn@blue-yonder.com>
Closes#3655 from xhochy/ARROW-4582 and squashes the following commits:
7f9838d <Korn, Uwe> docker-compose run clang-format
3d6e5ee <Korn, Uwe> ARROW-4582: Acquire the GIL on Py_INCREF
0 commit comments