-
-
Notifications
You must be signed in to change notification settings - Fork 749
Description
What happened:
I tried to send a ridiculously large numpy array to a dask cluster. This resulted in an OverflowError.
2022-06-09 20:24:23,931 - distributed.batched - ERROR - Error in batched write
Traceback (most recent call last):
File "/srv/conda/envs/saturn/lib/python3.9/site-packages/distributed/batched.py", line 94, in _background_send
nbytes = yield self.comm.write(
File "/srv/conda/envs/saturn/lib/python3.9/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/srv/conda/envs/saturn/lib/python3.9/site-packages/distributed/comm/tcp.py", line 314, in write
stream.write(b"")
File "/srv/conda/envs/saturn/lib/python3.9/site-packages/tornado/iostream.py", line 544, in write
self._handle_write()
File "/srv/conda/envs/saturn/lib/python3.9/site-packages/tornado/iostream.py", line 1486, in _handle_write
super()._handle_write()
File "/srv/conda/envs/saturn/lib/python3.9/site-packages/tornado/iostream.py", line 971, in _handle_write
num_bytes = self.write_to_fd(self._write_buffer.peek(size))
File "/srv/conda/envs/saturn/lib/python3.9/site-packages/tornado/iostream.py", line 1568, in write_to_fd
return self.socket.send(data) # type: ignore
File "/srv/conda/envs/saturn/lib/python3.9/ssl.py", line 1173, in send
return self._sslobj.write(data)
OverflowError: string longer than 2147483647 bytes
What you expected to happen:
No errors.
Minimal Complete Verifiable Example:
import numpy as np
from dask.distributed import Client
arr = np.random.random(300000000)
c = Client(...) # assuming you have a cluster that supports TLS
print(c.scheduler)
def func(x):
return x.sum()
fut = c.submit(func, arr)
print('submitted')
result = fut.result()
print('done')Anything else we need to know?:
There was a related fix in #5141, however that fix works by passing in frame_split_size, which gets passed to distributed.protocol.core.dumps. frame_split_size is only used if msgpack encounters a type it does not understand. The numpy array becomes a bytearray by the time it makes it to dumps, so frame_split_size is not used and we end up with a 2.4GB message.
Tornado actually has a fix that does resolve this issue, however it's not on the latest release of tornado which is quite old.
https://github.com/tornadoweb/tornado/blob/master/tornado/iostream.py#L1565
When I monkey patch write_to_fd, the problem is resolved.
Environment:
- Dask version: 2022.5
- Python version:3.9
- Operating System: Linux
- Install method (conda, pip, source): pip