Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve status check handling for GKE Autopilot clusters #6011

Open
briandealwis opened this issue Jun 11, 2021 · 4 comments
Open

Improve status check handling for GKE Autopilot clusters #6011

briandealwis opened this issue Jun 11, 2021 · 4 comments

Comments

@briandealwis
Copy link
Member

Can we improve the status check reporting when deploying to a GKE Autopilot cluster — informing the user that the cluster/node is being scaled up to accomodate the new job?

Waiting for deployments to stabilize...
 - deployment/leeroy-app: 0/3 nodes are available: 1 Insufficient memory, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1623436397}, that the pod didn't tolerate, 2 Insufficient cpu.
    - pod/leeroy-app-c469448b5-wb2db: 0/3 nodes are available: 1 Insufficient memory, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1623436397}, that the pod didn't tolerate, 2 Insufficient cpu.
 - deployment/leeroy-web: 0/3 nodes are available: 1 Insufficient memory, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1623436397}, that the pod didn't tolerate, 2 Insufficient cpu.
    - pod/leeroy-web-99d978f66-9dr2j: 0/3 nodes are available: 1 Insufficient memory, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1623436397}, that the pod didn't tolerate, 2 Insufficient cpu.
[large pause]
 - deployment/leeroy-web is ready. [1/2 deployment(s) still pending]
 - deployment/leeroy-app is ready.

If the pod is not scheduled, we could look at the events to see if there was a TriggeredScaleUp event.

@tejal29
Copy link
Member

tejal29 commented Jun 11, 2021

sounds like a nice feature. Is there a way we could detect if its a autopilot cluster?

@tejal29 tejal29 added the planning/Q4-21 Q4 2021 planning label Jun 11, 2021
@briandealwis
Copy link
Member Author

I've been told that looking at the pod events should show cluster autoscaling or node auto provisioning events. Both are briefly described here:

https://cloud.google.com/architecture/best-practices-for-running-cost-effective-kubernetes-applications-on-gke

@ValentinFunk
Copy link

Is there a way to use this with autopilot clusters yet? Deployments always fail for me (something about unscheduable), even if I see a TriggeredScaleUp

@aaron-prindle aaron-prindle added this to the v2.2.0 milestone Dec 5, 2022
@aaron-prindle aaron-prindle self-assigned this Jan 23, 2023
@aaron-prindle aaron-prindle modified the milestones: v2.2.0, v2.3.0 Mar 7, 2023
@aaron-prindle aaron-prindle removed their assignment Mar 8, 2023
@aaron-prindle aaron-prindle modified the milestones: v2.3.0, v2.4.0 Apr 3, 2023
@aaron-prindle aaron-prindle self-assigned this Apr 3, 2023
@aaron-prindle aaron-prindle modified the milestones: v2.4.0, v2.5.0 Apr 17, 2023
@aaron-prindle aaron-prindle modified the milestones: v2.4.0, v2.5.0 Apr 26, 2023
@aaron-prindle aaron-prindle added the priority/p1 High impact feature/bug. label May 22, 2023
@aaron-prindle aaron-prindle added the platform/gcp Issues relating specifically to GCP label May 25, 2023
@ericzzzzzzz
Copy link
Contributor

Please use tolerate-failures-until-deadline flag with auto-pilot cluster if this issue occurs

{
Name: "tolerate-failures-until-deadline",
Usage: "Configures `status-check` to tolerate failures until Skaffold's statusCheckDeadline duration or the deployments progressDeadlineSeconds Otherwise deployment failures skaffold encounters will immediately fail the deployment. Defaults to 'false'",
Value: &opts.TolerateFailuresStatusCheck,
DefValue: false,
FlagAddMethod: "BoolVar",
DefinedOn: []string{"dev", "debug", "deploy", "run", "apply"},
IsEnum: true,
},

@ericzzzzzzz ericzzzzzzz removed the priority/p1 High impact feature/bug. label Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants