Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent injector stopped working after few days #362

Open
xi2340-sdhage opened this issue Jun 16, 2022 · 5 comments
Open

Agent injector stopped working after few days #362

xi2340-sdhage opened this issue Jun 16, 2022 · 5 comments
Labels
bug Something isn't working injector Area: mutating webhook service

Comments

@xi2340-sdhage
Copy link

Describe the bug
We've setup our applications to be injected with secrets from the vault which was working fine at first.
Then every now and then the agent injector stops working when we redeploy an application (e.g. 2 days it works, then stopped the next day before working again the following day).
When this happens there's no vault init container.


To Reproduce
Steps to reproduce the behavior:

  1. Deploy application annotated for vault-agent injection
  2. Redeploy service again after 2 days (without any changes)
  3. See error
    • vault agent injector logs
2022-06-16T06:48:11.274Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s
2022-06-16T06:50:42.190Z [ERROR] handler: error on request: Error="error reading request body: read tcp 10.38.128.5:8080->10.32.0.1:42940: read: connection reset by peer" Code=400


Application deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "87"
  creationTimestamp: "2020-10-28T13:02:59Z"
  generation: 133
  name: distribution
  namespace: edge
  resourceVersion: "276077293"
  selfLink: /apis/apps/v1/namespaces/edge/deployments/distribution
  uid: 2a0448ce-337a-4b9b-bd1c-a53edb7238f9
spec:
  minReadySeconds: 5
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 5
  selector:
    matchLabels:
      name: distribution
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/agent-inject-default-template: json
        vault.hashicorp.com/agent-inject-secret-jwt: kv/jwt
        vault.hashicorp.com/role: service
      creationTimestamp: null
      labels:
        name: distribution
    spec:
      containers:
      - env:
        - name: FORCE_UPDATE
          value: acd4e1430977ebf8686fcb20322b764bd5ed38e55
        - name: SERVICE_NAMESPACE
          value: edge
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: git-registry.example.com/platform/be/distribution:edge
        imagePullPolicy: Always
        name: distribution
        ports:
        - containerPort: 8800
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/distribution/build
          name: distribution-data
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: gitlab
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: service
      serviceAccountName: service
      terminationGracePeriodSeconds: 30
      volumes:
      - name: distribution-data
        persistentVolumeClaim:
          claimName: distribution-data
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2022-06-13T14:20:34Z"
    lastUpdateTime: "2022-06-13T14:20:34Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2022-06-07T21:35:46Z"
    lastUpdateTime: "2022-06-16T08:41:25Z"
    message: ReplicaSet "distribution-7499c69b84" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 133
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1



Application deployment description:

Name:                   distribution
Namespace:              edge
CreationTimestamp:      Wed, 28 Oct 2020 17:02:59 +0400
Labels:                 <none>
Annotations:            deployment.kubernetes.io/revision: 87
Selector:               name=distribution
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        5
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           name=distribution
  Annotations:      vault.hashicorp.com/agent-inject: true
                    vault.hashicorp.com/agent-inject-default-template: json
                    vault.hashicorp.com/agent-inject-secret-jwt: kv/jwt
                    vault.hashicorp.com/role: service
  Service Account:  service
  Containers:
   distribution:
    Image:      git-registry.example.com/platform/be/distribution:edge
    Port:       8800/TCP
    Host Port:  0/TCP
    Environment:
      FORCE_UPDATE:       acd4e1430977ebf8686fcb20322b764bd5ed38e55
      SERVICE_NAMESPACE:  edge
      POD_IP:              (v1:status.podIP)
    Mounts:
      /var/distribution/build from distribution-data (rw)
  Volumes:
   distribution-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  distribution-data
    ReadOnly:   false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   distribution-7499c69b84 (1/1 replicas created)
Events:
  Type    Reason             Age                  From                   Message
  ----    ------             ----                 ----                   -------
  Normal  ScalingReplicaSet  164m                 deployment-controller  Scaled up replica set distribution-7bdd787f87 to 1
  Normal  ScalingReplicaSet  155m                 deployment-controller  Scaled down replica set distribution-7bdd787f87 to 0
  Normal  ScalingReplicaSet  149m                 deployment-controller  Scaled up replica set distribution-bc497c849 to 1
  Normal  ScalingReplicaSet  134m                 deployment-controller  Scaled up replica set distribution-7c765c5bb4 to 1
  Normal  ScalingReplicaSet  133m                 deployment-controller  Scaled down replica set distribution-bc497c849 to 0
  Normal  ScalingReplicaSet  129m (x3 over 167m)  deployment-controller  Scaled up replica set distribution-5fb7dbcf79 to 1
  Normal  ScalingReplicaSet  129m (x2 over 167m)  deployment-controller  Scaled down replica set distribution-7c765c5bb4 to 0
  Normal  ScalingReplicaSet  37m                  deployment-controller  Scaled up replica set distribution-7499c69b84 to 1
  Normal  ScalingReplicaSet  36m (x3 over 163m)   deployment-controller  Scaled down replica set distribution-5fb7dbcf79 to 0


Application replicaset description:

Name:           distribution-7499c69b84
Namespace:      edge
Selector:       name=distribution,pod-template-hash=7499c69b84
Labels:         name=distribution
                pod-template-hash=7499c69b84
Annotations:    deployment.kubernetes.io/desired-replicas: 1
                deployment.kubernetes.io/max-replicas: 2
                deployment.kubernetes.io/revision: 87
Controlled By:  Deployment/distribution
Replicas:       1 current / 1 desired
Pods Status:    1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           name=distribution
                    pod-template-hash=7499c69b84
  Annotations:      vault.hashicorp.com/agent-inject: true
                    vault.hashicorp.com/agent-inject-default-template: json
                    vault.hashicorp.com/agent-inject-secret-jwt: kv/jwt
                    vault.hashicorp.com/role: service
  Service Account:  service
  Containers:
   distribution:
    Image:      git-registry.example.com/platform/be/distribution:edge
    Port:       8800/TCP
    Host Port:  0/TCP
    Environment:
      FORCE_UPDATE:       acd4e1430977ebf8686fcb20322b764bd5ed38e55
      SERVICE_ENV:        edge
      SERVICE_NAMESPACE:  edge
      POD_IP:              (v1:status.podIP)
    Mounts:
      /var/distribution/build from distribution-data (rw)
  Volumes:
   distribution-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  distribution-data
    ReadOnly:   false
Events:
  Type    Reason            Age   From                   Message
  ----    ------            ----  ----                   -------
  Normal  SuccessfulCreate  41m   replicaset-controller  Created pod: distribution-7499c69b84-fm7cb



Expected behavior

Application deployed with init and vault agent containers and keys injected successfully.


Environment

  • Kubernetes version:
    • On-Premise Cloud
    • 1.17
  • vault-k8s version: 0.16.1


    Additional context

    Happened with version 0.14.2 too, which we then updated to 0.16.1
    Collaps
@xi2340-sdhage xi2340-sdhage added the bug Something isn't working label Jun 16, 2022
@xi2340-sdhage xi2340-sdhage changed the title Agent injector stopped working after few days or Agent injection fails reading request body Agent injector stopped working after few days Jun 16, 2022
@tvoran tvoran added the injector Area: mutating webhook service label Jun 16, 2022
@tvoran
Copy link
Member

tvoran commented Jun 24, 2022

Hi @sujeet111711, it looks like that error message is coming from the part of the vault-k8s injector that reads the body of the incoming request from the kubernetes api. So the next place I'd suggest looking is in the kube-apiserver's logs to see if there are any corresponding clues.

@xi2340-sdhage
Copy link
Author

xi2340-sdhage commented Jun 27, 2022

@tvoran As stated below posted kube-apiserver's logs.

@xi2340-sdhage
Copy link
Author

I0617 13:19:16.747654       1 trace.go:116] Trace[594097975]: "Call mutating webhook" configuration:vault-agent-injector-cfg,webhook:vault.hashicorp.com,resource:/v1, Resource=pods,subresource:,operation:CREATE,UID:c4a12b2c-3424-47f5-a150-d6780d801772 (started: 2022-06-17 13:18:46.74642399 +0000 UTC m=+1971231.734368781) (total time: 30.000998095s):
Trace[594097975]: [30.000998095s] [30.000998095s] END
W0617 13:19:16.747825       1 dispatcher.go:168] Failed calling webhook, failing open vault.hashicorp.com: failed calling webhook "vault.hashicorp.com": Post [https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s](https://vault-agent-injector-svc.vault.svc/mutate?timeout=30s): context deadline exceeded
E0617 13:19:16.747870       1 dispatcher.go:169] failed calling webhook "vault.hashicorp.com": Post [https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s](https://vault-agent-injector-svc.vault.svc/mutate?timeout=30s): context deadline `exceeded``
```

@krishnakc1
Copy link

krishnakc1 commented Dec 21, 2022

@xi2340-sdhage Were you able to resolve this problem? I have the same problem but I do not see error messages in API server logs

@xi2340-sdhage
Copy link
Author

xi2340-sdhage commented Dec 22, 2022

Please uninstall and reinstall by overrding values.Change according to you ENV requirements.
helm install -n test agent_name -f agent_values.yml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working injector Area: mutating webhook service
Projects
None yet
Development

No branches or pull requests

3 participants