Skip to content

azure machine-controller webhook timeout #1857

Open
@dharapvj

Description

@dharapvj

Lately, we see continuous failures to rollout new MD in Azure environments.

The error is always about machine-controller-webhook timing out. Error is seen in kubeone as well as KKP user-clusters.

Some API (mostly about VM sizes) in azure has become very slow (or we need better filters in our API call)

Here are logs from KKP user-cluster based MD

failed to create machine deployment: Internal error occurred: failed calling webhook "machine-controller.kubermatic.io-machinedeployments": failed to call webhook: Post "https://machine-controller-webhook.cluster-XXXXX.svc.cluster.local./machinedeployments?timeout=10s": context deadline exceeded
{
  "error": {
    "code": 500,
    "message": "failed to create machine deployment: admission webhook \"machine-controller.kubermatic.io-machinedeployments\" denied the request: validation failed: failed to get VM SKU: failed to list available SKUs: compute.ResourceSkusClient#List: Failure responding to request: StatusCode=200 -- Original Error: Error occurred reading http.Response#Body - Error = 'context canceled'"
  }
}

I have seen that if I increase wehbook timeout to 30s situation improves a bit.

But in general - since webhook can only have max 30s timeout - we should consider caching the list of VMs to speed things up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/rottenDenotes an issue or PR that has aged beyond stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions