Skip to content

Deadlock involving multiprocessing/pandas read_excel #517

Closed
@swt2c

Description

@swt2c

I'm experiencing an apparent deadlock when attempting to read Excel files with Pandas, but only after attempting to load a CSV file in the main process. I've reduced my code down to the following:

import multiprocessing
import pandas as pd

def read_file(path):
    print('Before read_excel')
    df = pd.read_excel(path)
    print('After read excel')
    return df

try:
    df = pd.read_csv('gs://<invalid_path_to_csv_file>')
except FileNotFoundError:
    pass
file = 'gs://<valid_path_to_xlsx_file>'
files = [file]
with multiprocessing.Pool(1) as pool:
    dfs = pool.map(read_file, files)

The subprocess will hang in the pd.read_excel() call. If I attach to it with GDB, it seems to be stuck trying to acquire a lock in fsspec:

#20 0x000000000054b302 in PyEval_EvalFrameEx (throwflag=0, 
    f=Frame 0x1bd4c80, for file <path_removed>/env/lib/python3.7/site-packages/fsspec/asyn.py, line 68, in sync (loop=<_UnixSelectorEventLoop(_timer_cancelled_count=0, _closed=False, _stopping=False, _ready=<collections.deque at remote 0x7f01fb966360>, _scheduled=[<TimerHandle at remote 0x7f01fa8b3150>], _default_executor=<ThreadPoolExecutor(_max_workers=20, _work_queue=<_queue.SimpleQueue at remote 0x7f01fb135fb0>, _threads={<Thread(_target=<function at remote 0x7f01fb1534d0>, _name='ThreadPoolExecutor-0_0', _args=(<weakref at remote 0x7f01fb152650>, <_queue.SimpleQueue at remote 0x7f01fb135fb0>, None, ()), _kwargs={}, _daemonic=True, _ident=139646475032320, _tstate_lock=None, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f01fd11d300>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f01fd11d300>, release=<built-in method release of _thread.lock object at remote 0x7f01fd11d300>, _waiters=<collections.deque at remote 0x7f01fb966520>) at remote 0x7f01f...(truncated)) at ../Python/ceval.c:547

My requirements:

aiohttp==3.7.3
async-timeout==3.0.1
attrs==20.3.0
cachetools==4.2.0
certifi==2020.12.5
cffi==1.14.4
chardet==3.0.4
decorator==4.4.2
et-xmlfile==1.0.1
fsspec==0.8.5
gcsfs==0.7.1
google-api-core==1.24.1
google-auth==1.24.0
google-auth-oauthlib==0.4.2
google-cloud-core==1.5.0
google-cloud-storage==1.35.0
google-crc32c==1.1.0
google-resumable-media==1.2.0
googleapis-common-protos==1.52.0
idna==2.10
jdcal==1.4.1
multidict==5.1.0
numpy==1.19.4
oauthlib==3.1.0
openpyxl==3.0.5
pandas==1.2.0
protobuf==3.14.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
python-dateutil==2.8.1
pytz==2020.5
requests==2.25.1
requests-oauthlib==1.3.0
rsa==4.6
six==1.15.0
typing-extensions==3.7.4.3
urllib3==1.26.2
yarl==1.6.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions