azure machine-controller webhook timeout

Lately, we see continuous failures to rollout new MD in Azure environments.

The error is always about machine-controller-webhook timing out. Error is seen in kubeone as well as KKP user-clusters.


Some API (mostly about VM sizes) in azure has become very slow (or we need better filters in our API call)

Here are logs from KKP user-cluster based MD
```
failed to create machine deployment: Internal error occurred: failed calling webhook "machine-controller.kubermatic.io-machinedeployments": failed to call webhook: Post "https://machine-controller-webhook.cluster-XXXXX.svc.cluster.local./machinedeployments?timeout=10s": context deadline exceeded
```

```
{
  "error": {
    "code": 500,
    "message": "failed to create machine deployment: admission webhook \"machine-controller.kubermatic.io-machinedeployments\" denied the request: validation failed: failed to get VM SKU: failed to list available SKUs: compute.ResourceSkusClient#List: Failure responding to request: StatusCode=200 -- Original Error: Error occurred reading http.Response#Body - Error = 'context canceled'"
  }
}
```

I have seen that if I increase wehbook timeout to 30s situation improves a bit.

But in general - since webhook can only have max 30s timeout - we should consider caching the list of VMs to speed things up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

azure machine-controller webhook timeout #1857

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

azure machine-controller webhook timeout #1857

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions