Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows pod deployment in ContainerCreating state with Flannel #7215

Closed
mdrahman-suse opened this issue Nov 7, 2024 · 3 comments · Fixed by flannel-io/flannel#2102
Closed
Assignees
Labels
kind/bug Something isn't working

Comments

@mdrahman-suse
Copy link
Contributor

mdrahman-suse commented Nov 7, 2024

Environmental Info:
RKE2 Version: v1.28.15+dev.e0119c8f and above
Used latest Flannel version: #7090

rke2 version v1.28.15+dev.e0119c8f (e0119c8fd26396e74f2da72d3c9836e8ae8278de)
go version go1.22.8 X:boringcrypto

Node(s) CPU architecture, OS, and Version:

Linux  5.15.0-1019-aws #23-Ubuntu SMP Wed Aug 17 18:33:13 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"

## Windows
OsName                                     : Microsoft Windows Server 2019 Datacenter
OsType                                     : WINNT
OsOperatingSystemSKU                       : DatacenterServerEdition
OsVersion                                  : 10.0.17763

Cluster Configuration:

1 Linux server, agent and 1 Windows agent
winapp.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: windows-app-deployment
spec:
  selector:
    matchLabels:
      app: windows-app
  replicas: 2
  template:
    metadata:
      labels:
        app: windows-app
    spec:
      containers:
        - name: windows-app
          image: mbuilsuse/pstools:v0.2.0
          ports:
            - containerPort: 3000
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - windows
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: windows-app-svc
  name: windows-app-svc
  namespace: default
spec:
  type: NodePort
  ports:
    - port: 3000
      nodePort: 30096
      name: http
  selector:
    app: windows-app

Describe the bug:

Windows app deployment fails with ContainerCreating state when rke2 cluster is created with Windows agent and cni: flannel. The below error is observed in describe pod

plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: The system cannot find the path specified.

Steps To Reproduce:

  • Installed RKE2 in 1 Linux server, 1 linux agent and 1 Windows agent
  • Ensure the cluster is up
  • Deploy winapp.yaml

Expected behavior:

  • Pod comes up and in Running state after deployment

Actual behavior:

  • Pod in ContainerCreating state
$ kgp | grep windows
default       windows-app-deployment-6964ff4fb8-dzlm4                               0/1     ContainerCreating   0          29m
default       windows-app-deployment-6964ff4fb8-tsbb6                               0/1     ContainerCreating   0          29m

$ k describe -n default pod/windows-app-deployment-6964ff4fb8-dzlm4
Name:             windows-app-deployment-6964ff4fb8-dzlm4
Namespace:        default
Priority:         0
Service Account:  default
Node:             ip-ac1f04cf/<ip>
Start Time:       Wed, 06 Nov 2024 23:36:02 +0000
Labels:           app=windows-app
                  pod-template-hash=6964ff4fb8
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/windows-app-deployment-6964ff4fb8
...
Events:
  Type     Reason                  Age                     From               Message
  ----     ------                  ----                    ----               -------
  Normal   Scheduled               10m                     default-scheduler  Successfully assigned default/windows-app-deployment-6964ff4fb8-dzlm4 to ip-ac1f04cf
  Warning  FailedCreatePodSandBox  9m42s                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "408269bf5768250f67ef9fdde6683ba1a89c71a2696054adc461959cd5251eed": plugin type="flannel" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: The system cannot find the path specified.

Additional context / logs:

  • An error is observed in Windows for flanneld.log but not sure if its related because flannel starts in Windows successfully and the agent node is joined
NAME            STATUS   ROLES                       AGE     VERSION
server1         Ready    <none>                      11m     v1.31.2+rke2r1
agent1          Ready    control-plane,etcd,master   13m     v1.31.2+rke2r1
ip-ac1f07ff     Ready    <none>                      5m40s   v1.31.2
  • In Windows:
PS C:\Users\Administrator> cat c:\var\lib\rancher\rke2\agent\logs\flanneld.log
I1106 23:35:51.875875    4232 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flanne
l.alpha.coreos.com kubeConfigFile:c:\var\lib\rancher\rke2\agent\flannel.kubeconfig iface:[172.31.4.207] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptabl
esForwardRules:false netConfPath:c:\var\lib\rancher\rke2\agent\flanneld-net-conf.json setNodeNetworkUnavailable:true}
I1106 23:35:51.910614    4232 kube.go:469] Starting kube subnet manager
I1106 23:35:51.910614    4232 kube.go:139] Waiting 10m0s for node controller to sync
I1106 23:35:51.922926    4232 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.0.0/24]
I1106 23:35:51.922926    4232 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.1.0/24]
I1106 23:35:52.911443    4232 kube.go:146] Node controller sync successful
I1106 23:35:52.911443    4232 main.go:231] Created subnet manager: Kubernetes Subnet Manager - ip-ac1f04cf
I1106 23:35:52.911443    4232 main.go:234] Installing signal handlers
I1106 23:35:52.911443    4232 main.go:466] Found network config - Backend type: vxlan
E1106 23:35:52.911443    4232 main.go:267] Failed to check br_netfilter: CreateFile /proc/sys/net/bridge/bridge-nf-call-iptables: The system cannot find the path specified.
  • kube-proxy.log is filled with
E1107 00:48:44.162122    3380 proxier.go:221] "Unable to find HNS Network specified, please check network name and CNI deployment" err="Network name \"flannel.4096\" not found" hnsNetworkName="flannel.4096"
  • Flannel and CNI Plugin version in Windows
PS C:\Users\Administrator> C:\var\lib\rancher\rke2\data\v1.31.2*\bin\flannel.exe -version
CNI Plugin flannel version v1.6.0-flannel1 (windows/amd64) commit 3389866d built on 2024-10-21T08:01:32Z
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0, 1.1.0
PS C:\Users\Administrator> C:\var\lib\rancher\rke2\data\v1.31.2*\bin\flanneld.exe -version
v0.26.0
@mdrahman-suse mdrahman-suse added the kind/bug Something isn't working label Nov 7, 2024
@mdrahman-suse mdrahman-suse added this to the 2024-11 Release Cycle milestone Nov 7, 2024
@mdrahman-suse mdrahman-suse changed the title Failing pod deployment on Windows agent with Flannel Failing Windows pod deployment on Windows agent with Flannel Nov 7, 2024
@mdrahman-suse mdrahman-suse changed the title Failing Windows pod deployment on Windows agent with Flannel Windows pod deployment in CreatingContainer state with Flannel Nov 7, 2024
@mdrahman-suse mdrahman-suse changed the title Windows pod deployment in CreatingContainer state with Flannel Windows pod deployment in ContainerCreating state with Flannel Nov 7, 2024
@brandond
Copy link
Member

brandond commented Nov 7, 2024

@rbrtbnfgl this change seems problematic, it needs to be os-specific and not run on windows as the kernel modules of course cannot be checked for or loaded on windows:

@rbrtbnfgl
Copy link
Contributor

My bad. That's why we need automatic tests also for windows on flannel. I'll fix it today and make a new release.

@mdrahman-suse
Copy link
Contributor Author

Validated on release-1.31 with commit

Ref: #7233 (comment)

$ rke2 -v
rke2 version v1.31.2+dev.5b9a8f82 (5b9a8f82e455d9e2da999c8e3f6b93174812175a)
go version go1.22.8 X:boringcrypto

$ kgn
NAME                                          STATUS   ROLES                       AGE   VERSION
ip-172-31-0-93.us-east-2.compute.internal     Ready    control-plane,etcd,master   25m   v1.31.2+rke2r1
ip-172-31-10-197.us-east-2.compute.internal   Ready    <none>                      23m   v1.31.2+rke2r1
ip-ac1f047d                                   Ready    <none>                      18m   v1.31.2

$ kgp
NAMESPACE     NAME                                                                 READY   STATUS      RESTARTS   AGE
kube-system   cloud-controller-manager-ip-172-31-0-93.us-east-2.compute.internal   1/1     Running     0          25m
kube-system   etcd-ip-172-31-0-93.us-east-2.compute.internal                       1/1     Running     0          24m
kube-system   helm-install-rke2-coredns-5v8vk                                      0/1     Completed   0          25m
kube-system   helm-install-rke2-flannel-5szps                                      0/1     Completed   0          25m
kube-system   helm-install-rke2-ingress-nginx-vbmb2                                0/1     Completed   0          25m
kube-system   helm-install-rke2-metrics-server-86vlj                               0/1     Completed   0          25m
kube-system   helm-install-rke2-snapshot-controller-crd-2b4tq                      0/1     Completed   0          25m
kube-system   helm-install-rke2-snapshot-controller-xwlz9                          0/1     Completed   0          25m
kube-system   helm-install-rke2-snapshot-validation-webhook-wqb9r                  0/1     Completed   0          25m
kube-system   kube-apiserver-ip-172-31-0-93.us-east-2.compute.internal             1/1     Running     0          24m
kube-system   kube-controller-manager-ip-172-31-0-93.us-east-2.compute.internal    1/1     Running     0          25m
kube-system   kube-flannel-ds-b4bkx                                                1/1     Running     0          23m
kube-system   kube-flannel-ds-rhhcq                                                1/1     Running     0          25m
kube-system   kube-proxy-ip-172-31-0-93.us-east-2.compute.internal                 1/1     Running     0          24m
kube-system   kube-proxy-ip-172-31-10-197.us-east-2.compute.internal               1/1     Running     0          23m
kube-system   kube-scheduler-ip-172-31-0-93.us-east-2.compute.internal             1/1     Running     0          25m
kube-system   rke2-coredns-rke2-coredns-6dbd4f7dd4-klpwz                           1/1     Running     0          25m
kube-system   rke2-coredns-rke2-coredns-6dbd4f7dd4-xxwzh                           1/1     Running     0          22m
kube-system   rke2-coredns-rke2-coredns-autoscaler-84766cf644-cbr87                1/1     Running     0          25m
kube-system   rke2-ingress-nginx-controller-97gk9                                  1/1     Running     0          23m
kube-system   rke2-ingress-nginx-controller-xkrwq                                  1/1     Running     0          22m
kube-system   rke2-metrics-server-7c85d458bd-wd4gg                                 1/1     Running     0          24m
kube-system   rke2-snapshot-controller-65bc6fbd57-n8lhq                            1/1     Running     0          24m
kube-system   rke2-snapshot-validation-webhook-859c7896df-wfspt                    1/1     Running     0          24m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants