Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node.js pod CrashLoopBackOff after auto-instrumenting #2655

Closed
Starefossen opened this issue Feb 21, 2024 · 0 comments · Fixed by #2695
Closed

Node.js pod CrashLoopBackOff after auto-instrumenting #2655

Starefossen opened this issue Feb 21, 2024 · 0 comments · Fixed by #2695
Labels
bug Something isn't working needs triage

Comments

@Starefossen
Copy link
Contributor

Component(s)

No response

What happened?

Description

Node.js application enters CrashLoop backoff when auto-instrumentation is enabled:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    instrumentation.opentelemetry.io/container-names: unleash
    instrumentation.opentelemetry.io/inject-nodejs: my-system/management-features
  creationTimestamp: "2024-02-21T19:48:21Z"
  generateName: my-demo-b64c6d87b-
  labels:
    app.kubernetes.io/created-by: controller-manager
    app.kubernetes.io/instance: my-demo
    app.kubernetes.io/name: Unleash
    app.kubernetes.io/part-of: unleasherator
    pod-template-hash: b64c6d87b
  name: my-demo-b64c6d87b-w72hh
  namespace: bifrost-unleash
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: my-demo-b64c6d87b
    uid: bd441ad4-6ada-4e92-b2fd-ec17a9ae9a44
  resourceVersion: "520000502"
  uid: 4a93c92e-3df2-4fc3-a611-1bdd4abe1ee6
spec:
  containers:
  - env:
    - name: INIT_ADMIN_API_TOKENS
      valueFrom:
        secretKeyRef:
          key: token
          name: unleasherator-my-demo-admin-key
    - name: DATABASE_PASS
      valueFrom:
        secretKeyRef:
          key: POSTGRES_PASSWORD
          name: my-demo
    - name: DATABASE_USER
      valueFrom:
        secretKeyRef:
          key: POSTGRES_USER
          name: my-demo
    - name: DATABASE_NAME
      valueFrom:
        secretKeyRef:
          key: POSTGRES_DB
          name: my-demo
    - name: DATABASE_HOST
      value: localhost
    - name: DATABASE_PORT
      value: "5432"
    - name: DATABASE_SSL
      value: "false"
    - name: DATABASE_URL
      value: postgres://$(DATABASE_USER):$(DATABASE_PASS)@$(DATABASE_HOST):$(DATABASE_PORT)/$(DATABASE_NAME)
    - name: GOOGLE_IAP_AUDIENCE
      value: /projects/898056957967/global/backendServices/6771496285844745965
    - name: TEAMS_API_URL
      value: http://teams-backend.my-system.svc/query
    - name: TEAMS_API_TOKEN
      valueFrom:
        secretKeyRef:
          key: token
          name: teams-api-token
    - name: TEAMS_ALLOWED_TEAMS
      value: aura,frontendplattform
    - name: LOG_LEVEL
      value: warn
    - name: DATABASE_POOL_MAX
      value: "3"
    - name: DATABASE_POOL_IDLE_TIMEOUT_MS
      value: "1000"
    - name: NODE_OPTIONS
      value: ' --require /otel-auto-instrumentation-nodejs/autoinstrumentation.js'
    - name: OTEL_SERVICE_NAME
      value: my-demo
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
      value: http://opentelemetry-management-collector.my-system:4317
    - name: OTEL_RESOURCE_ATTRIBUTES_POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: OTEL_RESOURCE_ATTRIBUTES_NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: OTEL_PROPAGATORS
      value: tracecontext,baggage,b3
    - name: OTEL_RESOURCE_ATTRIBUTES
      value: k8s.container.name=unleash,k8s.deployment.name=my-demo,k8s.namespace.name=bifrost-unleash,k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME),k8s.replicaset.name=my-demo-b64c6d87b,service.version=v5.8.2-20240130-115753-fd5cd41
    image: europe-north1-docker.pkg.dev/my-io/my/images/unleash-v4:v5.8.2-20240130-115753-fd5cd41
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /health
        port: 4242
        scheme: HTTP
      initialDelaySeconds: 5
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 10
    name: unleash
    ports:
    - containerPort: 4242
      name: http
      protocol: TCP
    resources:
      limits:
        memory: 256Mi
      requests:
        cpu: 100m
        memory: 128Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1001
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-xfgkk
      readOnly: true
    - mountPath: /otel-auto-instrumentation-nodejs
      name: opentelemetry-auto-instrumentation-nodejs
  - args:
    - --structured-logs
    - --port=5432
    - my-management-233d:europe-north1:bifrost-3de70742
    image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.1.0
    imagePullPolicy: IfNotPresent
    name: sql-proxy
    resources:
      limits:
        memory: 100Mi
      requests:
        cpu: 10m
        memory: 100Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      privileged: false
      runAsNonRoot: true
      runAsUser: 65532
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-xfgkk
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - command:
    - cp
    - -a
    - /autoinstrumentation/.
    - /otel-auto-instrumentation-nodejs
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.46.0
    imagePullPolicy: IfNotPresent
    name: opentelemetry-auto-instrumentation-nodejs
    resources:
      limits:
        cpu: 500m
        memory: 128Mi
      requests:
        cpu: 50m
        memory: 128Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1001
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /otel-auto-instrumentation-nodejs
      name: opentelemetry-auto-instrumentation-nodejs
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-xfgkk
      readOnly: true
  nodeName: gke-my-management--nap-e2-standard--7ff15a1a-9bqj
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: gke.io/optimize-utilization-scheduler
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: bifrost-unleash-sql-user
  serviceAccountName: bifrost-unleash-sql-user
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-xfgkk
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
  - emptyDir:
      sizeLimit: 200Mi
    name: opentelemetry-auto-instrumentation-nodejs

Steps to Reproduce

Enable auto-instrumentation of a nodejs application like the one above.

Expected Result

It should not crash.

Actual Result

It fails to start with the following error:

cp: can't preserve ownership of '...': Operation not permitted

Kubernetes Version

v1.28.3

Operator version

0.93.0

Collector version

latest

Environment information

Environment

Cloud: GKE

Log output

cp: can't preserve ownership of '/otel-auto-instrumentation-nodejs/./autoinstrumentation.js': Operation not permitted
cp: can't preserve ownership of '/otel-auto-instrumentation-nodejs/./autoinstrumentation.d.ts.map': Operation not permitted
...

Additional context

No response

@Starefossen Starefossen added bug Something isn't working needs triage labels Feb 21, 2024
iblancasa added a commit to iblancasa/opentelemetry-operator that referenced this issue Feb 29, 2024
…tion. Closes open-telemetry#2655

Signed-off-by: Israel Blancas <iblancasa@gmail.com>
ItielOlenick pushed a commit to ItielOlenick/opentelemetry-operator that referenced this issue May 1, 2024
…tion (open-telemetry#2695)

* Not preserve the ownership of the files copied in the autoinstrumentation. Closes open-telemetry#2655

Signed-off-by: Israel Blancas <iblancasa@gmail.com>

* Update .chloggen/fix-2655.yaml

Co-authored-by: Mikołaj Świątek <mail+sumo@mikolajswiatek.com>

---------

Signed-off-by: Israel Blancas <iblancasa@gmail.com>
Co-authored-by: Mikołaj Świątek <mail+sumo@mikolajswiatek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
None yet
1 participant