Skip to content

[BUG] Intermittent Node Authorizer Forbidden Errors #4727

Open
@phil-fileread

Description

Describe the bug
A clear and concise description of what the bug is.

We are running the latest version of Github Actions arc runners in our AKS cluster (v1.29.1 - free tier - 10.5.6.0/24 service CIDR )
Node pools: Standard_d8ads_v6 / AKSUbuntu-2204gen2containerd-202412.10.0

When the arc runner controller spin up a container to run a Github Actions workflow, we often see the job failing with ECONNREFUSED 10.5.6.1:443. This happens typically in the "Initialize container" or "Stop containers' pre/post workflow tasks.

At this point, it is not clear what triggers this behaviour.

That said, we see the following logs via diagnostic settings:

I0101 16:13:23.562959 1 node_authorizer.go:205] "NODE DENY" err="node 'aks-cicd-26340822-vmss000006' cannot get unknown secret arc-runners/arc-runner-set-redacted-lr4hc-runner-fqw7h"
And the following logs in the systemd journal logs for the kubelet:

Note the following logs show an additional error on a resource unrelated to GHA arc runners which suggests this is not necessarily scoped to GHA runners and therefore a broader AKS issue.

Jan 01 02:31:24 aks-cicd-26340822-vmss000000 kubelet[2812]: E0101 02:31:24.056768 2812 reflector.go:147] object-"kube-system"/"metrics-server-config": Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "metrics-server-config" is forbidden: User "system:node:aks-cicd-26340822-vmss000000" cannot list resource "configmaps" in API group "" in the namespace "kube-system": no relationship found between node 'aks-cicd-26340822-vmss000000' and this object

Jan 01 16:19:51 aks-cicd-26340822-vmss000003 kubelet[2935]: E0101 16:19:51.737343 2935 reflector.go:147] object-"arc-runners"/"arc-runner-set-redacted-lr4hc-runner-czzj2": Failed to watch *v1.Secret: failed to list *v1.Secret: secrets "arc-runner-set-redacted-lr4hc-runner-czzj2" is forbidden: User "system:node:aks-cicd-26340822-vmss000003" cannot list resource "secrets" in API group "" in the namespace "arc-runners": no relationship found between node 'aks-cicd-26340822-vmss000003' and this object

Of note, we have not adjusted any cluster settings related to the node authorizer or the additional permissions node seems to receive via RBAC since cluster initialization.

To Reproduce
We cannot seem to reliably reproduce this issue.

Expected behavior
The pods should succeed without failure.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4
AKS free tier
10.5.6.0/24 service CIDR
Node pools: Standard_d8ads_v6 / AKSUbuntu-2204gen2containerd-202412.10.0

Additional context
Add any other context about the problem here.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions