Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Azure] KeyError when launching Azure cluster #1408

Closed
Michaelvll opened this issue Nov 13, 2022 · 4 comments
Closed

[Azure] KeyError when launching Azure cluster #1408

Michaelvll opened this issue Nov 13, 2022 · 4 comments
Labels
bug Something isn't working Stale

Comments

@Michaelvll
Copy link
Collaborator

Michaelvll commented Nov 13, 2022

The following error occurs occasionally. It seems to happen more often when test_cancel_azure and test_azure_start_stop run together.

  File "/home/ubuntu/skypilot/sky/skylet/providers/azure/node_provider.py", line 362, in _get_cached_node
    return self._get_node(node_id=node_id)
  File "/home/ubuntu/skypilot/sky/skylet/providers/azure/node_provider.py", line 357, in _get_node
    return self.cached_nodes[node_id]
KeyError: 'ray-test-cancel-azure-3446f302-2b-head-138eaa0b0'

@suquark Do you think there is a race condition around the following place? Should it be a file lock across multiple ray up?

@synchronized
def _get_filtered_nodes(self, tag_filters):

@Michaelvll Michaelvll added the bug Something isn't working label Nov 17, 2022
@Michaelvll
Copy link
Collaborator Author

This happens much more often recently. Do we have any idea @suquark ? :)

@Michaelvll
Copy link
Collaborator Author

It seems the problem happens when the test tries to sky start the cluster is just stopped with sky stop.

@github-actions
Copy link

github-actions bot commented Jun 5, 2023

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Jun 5, 2023
@Michaelvll
Copy link
Collaborator Author

This has not been observed for a while. Closed for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant