Closed
Description
I'm experiencing an apparent deadlock when attempting to read Excel files with Pandas, but only after attempting to load a CSV file in the main process. I've reduced my code down to the following:
import multiprocessing
import pandas as pd
def read_file(path):
print('Before read_excel')
df = pd.read_excel(path)
print('After read excel')
return df
try:
df = pd.read_csv('gs://<invalid_path_to_csv_file>')
except FileNotFoundError:
pass
file = 'gs://<valid_path_to_xlsx_file>'
files = [file]
with multiprocessing.Pool(1) as pool:
dfs = pool.map(read_file, files)
The subprocess will hang in the pd.read_excel() call. If I attach to it with GDB, it seems to be stuck trying to acquire a lock in fsspec:
#20 0x000000000054b302 in PyEval_EvalFrameEx (throwflag=0,
f=Frame 0x1bd4c80, for file <path_removed>/env/lib/python3.7/site-packages/fsspec/asyn.py, line 68, in sync (loop=<_UnixSelectorEventLoop(_timer_cancelled_count=0, _closed=False, _stopping=False, _ready=<collections.deque at remote 0x7f01fb966360>, _scheduled=[<TimerHandle at remote 0x7f01fa8b3150>], _default_executor=<ThreadPoolExecutor(_max_workers=20, _work_queue=<_queue.SimpleQueue at remote 0x7f01fb135fb0>, _threads={<Thread(_target=<function at remote 0x7f01fb1534d0>, _name='ThreadPoolExecutor-0_0', _args=(<weakref at remote 0x7f01fb152650>, <_queue.SimpleQueue at remote 0x7f01fb135fb0>, None, ()), _kwargs={}, _daemonic=True, _ident=139646475032320, _tstate_lock=None, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7f01fd11d300>, acquire=<built-in method acquire of _thread.lock object at remote 0x7f01fd11d300>, release=<built-in method release of _thread.lock object at remote 0x7f01fd11d300>, _waiters=<collections.deque at remote 0x7f01fb966520>) at remote 0x7f01f...(truncated)) at ../Python/ceval.c:547
My requirements:
aiohttp==3.7.3
async-timeout==3.0.1
attrs==20.3.0
cachetools==4.2.0
certifi==2020.12.5
cffi==1.14.4
chardet==3.0.4
decorator==4.4.2
et-xmlfile==1.0.1
fsspec==0.8.5
gcsfs==0.7.1
google-api-core==1.24.1
google-auth==1.24.0
google-auth-oauthlib==0.4.2
google-cloud-core==1.5.0
google-cloud-storage==1.35.0
google-crc32c==1.1.0
google-resumable-media==1.2.0
googleapis-common-protos==1.52.0
idna==2.10
jdcal==1.4.1
multidict==5.1.0
numpy==1.19.4
oauthlib==3.1.0
openpyxl==3.0.5
pandas==1.2.0
protobuf==3.14.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
python-dateutil==2.8.1
pytz==2020.5
requests==2.25.1
requests-oauthlib==1.3.0
rsa==4.6
six==1.15.0
typing-extensions==3.7.4.3
urllib3==1.26.2
yarl==1.6.3
Metadata
Metadata
Assignees
Labels
No labels