Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[autoscaler] Creating instance profile fails on exception handling #3533

Closed
richardliaw opened this issue Dec 13, 2018 · 6 comments
Closed
Assignees

Comments

@richardliaw
Copy link
Contributor

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac, Linux
  • Ray installed from (source or binary):
  • Ray version: 0.6
  • Python version: 3.6
  • Exact command to reproduce:
    ray up example-full.yaml on an account without the proper instance profile name.

Describe the problem

I think the exception-handling is just out of date.

Source code / logs

╭─ ~/Research/ec2/clustercfgs 
╰─ aws iam remove-role-from-instance-profile --role-name ray-autoscaler-v1  --instance-profile-name ray-autoscaler-v1
╭─ ~/Research/ec2/clustercfgs 
╰─ aws iam delete-instance-profile --instance-profile-name ray-autoscaler-v1
╭─ ~/Research/ec2/clustercfgs 
╰─ ray up example-full.yaml
Traceback (most recent call last):
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/ray/autoscaler/aws/config.py", line 284, in _get_instance_profile
    profile.load()
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/boto3/resources/factory.py", line 505, in do_action
    response = action(self, *args, **kwargs)
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/botocore/client.py", line 317, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/botocore/client.py", line 615, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.NoSuchEntityException: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile ray-autoscaler-v1 cannot be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/rliaw/miniconda3/envs/ray/bin/ray", line 11, in <module>
    sys.exit(main())
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/ray/scripts/scripts.py", line 690, in main
    return cli()
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/ray/scripts/scripts.py", line 470, in create_or_update
    no_restart, restart_only, yes, cluster_name)
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/ray/autoscaler/commands.py", line 42, in create_or_update_cluster
    config = _bootstrap_config(config)
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/ray/autoscaler/commands.py", line 64, in _bootstrap_config
    resolved_config = bootstrap_config(config)
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/ray/autoscaler/aws/config.py", line 43, in bootstrap_aws
    config = _configure_iam_role(config)
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/ray/autoscaler/aws/config.py", line 62, in _configure_iam_role
    profile = _get_instance_profile(DEFAULT_RAY_INSTANCE_PROFILE, config)
  File "/Users/rliaw/miniconda3/envs/ray/lib/python3.6/site-packages/ray/autoscaler/aws/config.py", line 286, in _get_instance_profile
    except botocore.errorfactory.NoSuchEntityException:
AttributeError: module 'botocore.errorfactory' has no attribute 'NoSuchEntityException'
@richardliaw richardliaw assigned ericl and richardliaw and unassigned ericl Dec 13, 2018
richardliaw added a commit that referenced this issue Dec 14, 2018
Unfortunately Boto generates error classes dynamically, so this catches
the expected error and raises the error if it is the wrong class.

Closes #3533.
@eric-valente
Copy link

I'm getting this error with a fresh install of Ray, the YAML file in this repo, properly configured boto, and running ray up on the YAML.

Is there a way to proceed?

No handlers could be found for logger "ray.worker"
Traceback (most recent call last):
File "/usr/local/bin/ray", line 11, in
sys.exit(main())
File "/Library/Python/2.7/site-packages/ray/scripts/scripts.py", line 690, in main
return cli()
File "/Library/Python/2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/Library/Python/2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/Library/Python/2.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Library/Python/2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Library/Python/2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/Library/Python/2.7/site-packages/ray/scripts/scripts.py", line 470, in create_or_update
no_restart, restart_only, yes, cluster_name)
File "/Library/Python/2.7/site-packages/ray/autoscaler/commands.py", line 42, in create_or_update_cluster
config = _bootstrap_config(config)
File "/Library/Python/2.7/site-packages/ray/autoscaler/commands.py", line 64, in _bootstrap_config
resolved_config = bootstrap_config(config)
File "/Library/Python/2.7/site-packages/ray/autoscaler/aws/config.py", line 43, in bootstrap_aws
config = _configure_iam_role(config)
File "/Library/Python/2.7/site-packages/ray/autoscaler/aws/config.py", line 62, in _configure_iam_role
profile = _get_instance_profile(DEFAULT_RAY_INSTANCE_PROFILE, config)
File "/Library/Python/2.7/site-packages/ray/autoscaler/aws/config.py", line 286, in _get_instance_profile
except botocore.errorfactory.NoSuchEntityException:
AttributeError: 'module' object has no attribute 'NoSuchEntityException'

@richardliaw
Copy link
Contributor Author

Actually this can be fixed as of today by installing the latest snapshot of master - https://ray.readthedocs.io/en/latest/installation.html

@eric-valente
Copy link

Ah got it, new error (thanks for quick reply!):

Maybe the AMI ID changed for the Deep Learning image?

botocore.exceptions.ClientError: An error occurred (InvalidAMIID.NotFound) when calling the RunInstances operation: The image id '[ami-3b6bce43]' does not exist

@richardliaw
Copy link
Contributor Author

richardliaw commented Dec 15, 2018 via email

@eric-valente
Copy link

Ah I switched to US-East-1 in my YAML, diff ID. Thanks man!

@richardliaw
Copy link
Contributor Author

richardliaw commented Dec 15, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants