Skip to content

Unable to create new workers in adaptive GCP cluster #179

Closed
@eric-czech

Description

@eric-czech

I recently created a GCP cluster with adaptive scaling and 0 workers. After ~2 hours I then went to run a task and new workers failed to launch. This is the error that was thrown:

Creating worker instance
Task exception was never retrieved
future: <Task finished name='Task-533' coro=<_wrap_awaitable() done, defined at /home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/asyncio/tasks.py:685> exception=BrokenPipeError(32, 'Broken pipe')>
Traceback (most recent call last):
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/asyncio/tasks.py", line 692, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/distributed/deploy/spec.py", line 71, in _
    await self.start()
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 321, in start
    await self.start_worker()
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 325, in start_worker
    self.internal_ip, self.external_ip = await self.create_vm()
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 188, in create_vm
    self.cluster.compute.instances()
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/googleapiclient/http.py", line 900, in execute
    resp, content = _retry_request(
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/googleapiclient/http.py", line 204, in _retry_request
    raise exception
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/googleapiclient/http.py", line 177, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google_auth_httplib2.py", line 189, in request
    self.credentials.before_request(
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google/auth/credentials.py", line 133, in before_request
    self.refresh(request)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google/oauth2/credentials.py", line 200, in refresh
    access_token, refresh_token, expiry, grant_response = _client.refresh_grant(
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google/oauth2/_client.py", line 248, in refresh_grant
    response_data = _token_endpoint_request(request, token_uri, body)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google/oauth2/_client.py", line 105, in _token_endpoint_request
    response = request(method="POST", url=token_uri, headers=headers, body=body)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google_auth_httplib2.py", line 116, in __call__
    response, data = self.http.request(
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/httplib2/__init__.py", line 1985, in request
    (response, content) = self._request(
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/httplib2/__init__.py", line 1650, in _request
    (response, content) = self._conn_request(
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/httplib2/__init__.py", line 1558, in _conn_request
    conn.request(method, request_uri, body, headers)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 1301, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 1049, in _send_output
    self.send(chunk)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 971, in send
    self.sock.sendall(data)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/ssl.py", line 1204, in sendall
    v = self.send(byte_view[count:])
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/ssl.py", line 1173, in send
    return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe

I haven't had an issues with adaptive clusters like this when I don't let them idle for as long.

I'm working off of commit https://github.com/dask/dask-cloudprovider/tree/35deeb415e061ca90973fd24e56b1b7a6f54bc16.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingprovider/gcp/vmCluster provider for GCP Instances

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions