-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GCP/Spot] Skypilot GCP user credentials expire on controller with SSO #2738
Comments
I've also run into this issue but could not resolve |
I think only service accounts are supported and the docs try to say this but are a little unclear: https://skypilot.readthedocs.io/en/latest/cloud-setup/cloud-auth.html#gcp |
I run into this issue too. Got around it by forcing a service account using a bash wrapper script, but that's very brittle. When accidentally misconfigured, falls back to user credentials and when the expire, all the bucket mounts die. |
We should definitely fix this. The issue is related to the organization's reauthentication policy set up by cloud admins: https://support.google.com/cloudidentity/answer/9368756?hl=en# (Our dev accounts likely don't have this set, so we never ran into this problem.) To solve this, we should probably make the spot controller use a long-lived service account so it doesn't need to reauth. I just tried the following on a new Google Cloud email/project, and verified that launching a VM works:
Let us know if the above works for you? @ethansiegl @kuza55 @zxexz |
Yep. I had actually already followed those docs - it does work - most of the time. it's hard to track down exactly why, but whenever I I have a "workaround" for now, in our launch wrapper script I make sure to do a |
This comment was marked as outdated.
This comment was marked as outdated.
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
Hi,
I have been trying to use skypilot's spot scheduling.
I have run into issues where the controller becomes unresponsive and the ~/.sky/skylet.log file is filled with this error:
google.auth.exceptions.RefreshError: Reauthentication is needed. Please run `gcloud auth application-default login` to reauthenticate.
My understanding of the issue is that user credentials and access tokens derived from them expire in relatively short time windows, though I am poking at my setup to see if I am doing something weird: https://stackoverflow.com/questions/69229759/longer-lasting-user-credentials-with-gcloud-auth-prevent-expiration
I have seen suggestions to create service accounts and interact with gcp using a service account, which will likely address this issue, but that feels at least like a documentation bug.
Alternatively using the service account that skypilot creates seems like it would make sense to me.
The text was updated successfully, but these errors were encountered: