Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create new workers in adaptive GCP cluster #179

Closed
eric-czech opened this issue Nov 19, 2020 · 1 comment · Fixed by #182
Closed

Unable to create new workers in adaptive GCP cluster #179

eric-czech opened this issue Nov 19, 2020 · 1 comment · Fixed by #182
Labels
bug Something isn't working provider/gcp/vm Cluster provider for GCP Instances

Comments

@eric-czech
Copy link
Contributor

I recently created a GCP cluster with adaptive scaling and 0 workers. After ~2 hours I then went to run a task and new workers failed to launch. This is the error that was thrown:

Creating worker instance
Task exception was never retrieved
future: <Task finished name='Task-533' coro=<_wrap_awaitable() done, defined at /home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/asyncio/tasks.py:685> exception=BrokenPipeError(32, 'Broken pipe')>
Traceback (most recent call last):
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/asyncio/tasks.py", line 692, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/distributed/deploy/spec.py", line 71, in _
    await self.start()
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 321, in start
    await self.start_worker()
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 325, in start_worker
    self.internal_ip, self.external_ip = await self.create_vm()
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/dask_cloudprovider/gcp/instances.py", line 188, in create_vm
    self.cluster.compute.instances()
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/googleapiclient/http.py", line 900, in execute
    resp, content = _retry_request(
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/googleapiclient/http.py", line 204, in _retry_request
    raise exception
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/googleapiclient/http.py", line 177, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google_auth_httplib2.py", line 189, in request
    self.credentials.before_request(
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google/auth/credentials.py", line 133, in before_request
    self.refresh(request)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google/oauth2/credentials.py", line 200, in refresh
    access_token, refresh_token, expiry, grant_response = _client.refresh_grant(
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google/oauth2/_client.py", line 248, in refresh_grant
    response_data = _token_endpoint_request(request, token_uri, body)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google/oauth2/_client.py", line 105, in _token_endpoint_request
    response = request(method="POST", url=token_uri, headers=headers, body=body)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/google_auth_httplib2.py", line 116, in __call__
    response, data = self.http.request(
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/httplib2/__init__.py", line 1985, in request
    (response, content) = self._request(
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/httplib2/__init__.py", line 1650, in _request
    (response, content) = self._conn_request(
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/site-packages/httplib2/__init__.py", line 1558, in _conn_request
    conn.request(method, request_uri, body, headers)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 1301, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 1049, in _send_output
    self.send(chunk)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/http/client.py", line 971, in send
    self.sock.sendall(data)
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/ssl.py", line 1204, in sendall
    v = self.send(byte_view[count:])
  File "/home/eczech/miniconda3/envs/cloudprovider/lib/python3.8/ssl.py", line 1173, in send
    return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe

I haven't had an issues with adaptive clusters like this when I don't let them idle for as long.

I'm working off of commit https://github.com/dask/dask-cloudprovider/tree/35deeb415e061ca90973fd24e56b1b7a6f54bc16.

@jacobtomlinson jacobtomlinson added bug Something isn't working provider/gcp/vm Cluster provider for GCP Instances labels Nov 20, 2020
@jacobtomlinson
Copy link
Member

Thanks for raising this @eric-czech!

It looks like the auth tokens have expired in that two hours and need to be refreshed. I'm surprised the googleapiclient library doesn't do this for you, but I guess it just doesn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working provider/gcp/vm Cluster provider for GCP Instances
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants