Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frauddetectionservice pod keeps crashing with chart 0.18 #634

Open
marcomusso opened this issue Feb 8, 2023 · 14 comments
Open

frauddetectionservice pod keeps crashing with chart 0.18 #634

marcomusso opened this issue Feb 8, 2023 · 14 comments
Labels
bug Something isn't working chart:demo Issues related to opentelemetry-demo helm chart

Comments

@marcomusso
Copy link

marcomusso commented Feb 8, 2023

I upgraded today my demo setup on EKS and I found that the frauddetectionservice keeps crashing with this error:

Picked up JAVA_TOOL_OPTIONS: -javaagent:/app/opentelemetry-javaagent.jar
Error opening zip file or JAR manifest missing : /app/opentelemetry-javaagent.jar
Error occurred during initialization of VM
agent library failed to init: instrument

The values section for that service is up-to-date, it's possible that something is wrong with the image (ghcr.io/open-telemetry/demo:1.3.0-frauddetectionservice)?

PS: one change I did, even with the previous chart version/values, is to run the services as a normal user since I cannot run pods as root... so other than limits and env var I added a security context like this:

    securityContext:
      runAsUser: 1000
      runAsGroup: 1000
      runAsNonRoot: true
@marcomusso marcomusso changed the title frauddetectionservice pod keep crashing with chart 0.18 frauddetectionservice pod keeps crashing with chart 0.18 Feb 8, 2023
@TylerHelmuth TylerHelmuth added bug Something isn't working chart:demo Issues related to opentelemetry-demo helm chart labels Feb 8, 2023
@realtimetodie
Copy link

I can verify that the /app/opentelemetry-javaagent.jar actually exists inside of the container image. This might be a permission issue.

@puckpuck
Copy link
Contributor

I think this PR may be the culprit. It commented out a chmod 644 command for the agent jar.

If I created and published a test frauddetectionservice image, can you test using that image?

@puckpuck
Copy link
Contributor

I created this image if you want to test it. I added a --chmod=644 flag to the ADD command for the otel agent.

puckpuck/otel-demo:issue634-frauddetectionservice

@marcomusso
Copy link
Author

No changes:

Picked up JAVA_TOOL_OPTIONS: -javaagent:/app/opentelemetry-javaagent.jar
Error opening zip file or JAR manifest missing : /app/opentelemetry-javaagent.jar
Error occurred during initialization of VM
agent library failed to init: instrument
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  2m7s                default-scheduler  Successfully assigned observability-poc/opentelemetry-demo-frauddetectionservice-749c75657b-5jmx5 to ip-X-X-X-X.yyyyyyy.compute.internal
  Normal   Pulling    2m6s                kubelet            Pulling image "puckpuck/otel-demo:issue634-frauddetectionservice"
  Normal   Pulled     2m1s                kubelet            Successfully pulled image "puckpuck/otel-demo:issue634-frauddetectionservice" in 5.122829888s
  Normal   Created    28s (x5 over 2m1s)  kubelet            Created container frauddetectionservice
  Normal   Pulled     28s (x4 over 118s)  kubelet            Container image "puckpuck/otel-demo:issue634-frauddetectionservice" already present on machine
  Normal   Started    27s (x5 over 2m)    kubelet            Started container frauddetectionservice
  Warning  BackOff    12s (x9 over 117s)  kubelet            Back-off restarting failed container

With just this change to the values:

  frauddetectionService:
    imageOverride:
      repository: puckpuck/otel-demo
      tag: issue634-frauddetectionservice

@marcomusso
Copy link
Author

marcomusso commented Mar 10, 2023

Is there a known shell as entrypoint for that container? I tried the usual shell but couldn't exec into it.
Locally (docker run) it runs and I wanted to check if maybe I need to change the uid/gid when running in k8s...

@marcomusso
Copy link
Author

PS: also running it locally but in a k3d cluster fails with the same error (using the chart that is).

@realtimetodie
Copy link

I tried the same thing, but the images are based on a distroless base image. We are talking about a demo here. Why the hell would you use distroless for a demo is beyond me, this is totally nuts.

@puckpuck
Copy link
Contributor

puckpuck commented Mar 10, 2023

In this case the distroless base image is used to reduce build time and image footprint.

The demo is quite large and both of these have been a concern for some time.

@realtimetodie
Copy link

There should be some alternative debug image available always, for example with a suffix -debug. I will make a PR in the demo repository to build multiple images.

@puckpuck
Copy link
Contributor

I wonder if you reverted the Dockerfile changes on this PR and used that as the image to see if it would work.

@marcomusso
Copy link
Author

I spent some time trying to do that (including waiting a full docker compose build) with no luck. Just running docker build in the frauddetectionservice doesn't work. I admit I was strictly time bound on this...

@marcomusso
Copy link
Author

Trying again with 1.4.0 and latest chart but until now:

frauddetectionservice Picked up JAVA_TOOL_OPTIONS: -javaagent:/app/opentelemetry-javaagent.jar
frauddetectionservice Error opening zip file or JAR manifest missing : /app/opentelemetry-javaagent.jar
frauddetectionservice Error occurred during initialization of VM
frauddetectionservice agent library failed to init: instrument
wait-for-kafka opentelemetry-demo-kafka (172.20.122.10:9092) open
(frauddetectionservice)

@maczg
Copy link

maczg commented Aug 31, 2023

Same issue deploying helm chart on openshift okd 4.12

Solved adding scc anyuid to service account as in open-telemetry/opentelemetry-demo#777 (comment) and rollout deployment

@marcomusso
Copy link
Author

marcomusso commented Jan 24, 2024

Small update: with 1.7.2 and my modified values file specifying the same securityContext I get the same error, latest chart version as well.
PS: adservice seems affected by the same problem (or at least the same symptom):

Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/src/app/opentelemetry-javaagent.jar
Error opening zip file or JAR manifest missing : /usr/src/app/opentelemetry-javaagent.jar
Error occurred during initialization of VM
agent library failed Agent_OnLoad: instrument

same securityContext.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working chart:demo Issues related to opentelemetry-demo helm chart
Projects
None yet
Development

No branches or pull requests

5 participants