Closed
Description
I recently created a GCP cluster with adaptive scaling and 0 workers. After ~2 hours I then went to run a task and new workers failed to launch. This is the error that was thrown:
Creating worker instance
Task exception was never retrieved
future: <Task finished name='Task-533' coro=<_wrap_awaitable() done, defined at /home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/asyncio/tasks.py:685> exception=BrokenPipeError(32, 'Broken pipe')>
Traceback (most recent call last):
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/asyncio/tasks.py", line 692, in _wrap_awaitable
return (yield from awaitable.__await__())
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/distributed/deploy/spec.py", line 71, in _
await self.start()
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 321, in start
await self.start_worker()
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 325, in start_worker
self.internal_ip, self.external_ip = await self.create_vm()
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 188, in create_vm
self.cluster.compute.instances()
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/googleapiclient/http.py", line 900, in execute
resp, content = _retry_request(
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/googleapiclient/http.py", line 204, in _retry_request
raise exception
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/googleapiclient/http.py", line 177, in _retry_request
resp, content = http.request(uri, method, *args, **kwargs)
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google_auth_httplib2.py", line 189, in request
self.credentials.before_request(
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google/auth/credentials.py", line 133, in before_request
self.refresh(request)
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google/oauth2/credentials.py", line 200, in refresh
access_token, refresh_token, expiry, grant_response = _client.refresh_grant(
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google/oauth2/_client.py", line 248, in refresh_grant
response_data = _token_endpoint_request(request, token_uri, body)
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google/oauth2/_client.py", line 105, in _token_endpoint_request
response = request(method="POST", url=token_uri, headers=headers, body=body)
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google_auth_httplib2.py", line 116, in __call__
response, data = self.http.request(
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/httplib2/__init__.py", line 1985, in request
(response, content) = self._request(
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/httplib2/__init__.py", line 1650, in _request
(response, content) = self._conn_request(
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/httplib2/__init__.py", line 1558, in _conn_request
conn.request(method, request_uri, body, headers)
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 1049, in _send_output
self.send(chunk)
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 971, in send
self.sock.sendall(data)
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/ssl.py", line 1204, in sendall
v = self.send(byte_view[count:])
File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/ssl.py", line 1173, in send
return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe
I haven't had an issues with adaptive clusters like this when I don't let them idle for as long.
I'm working off of commit https://github.com/dask/dask-cloudprovider/tree/35deeb415e061ca90973fd24e56b1b7a6f54bc16.