Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant Memory Increase with OpenTelemetry Leading to OoMKilled Issues on Kubernetes #5461

Open
phthaocse opened this issue Jun 1, 2024 · 6 comments
Labels
bug Something isn't working invalid This doesn't seem right question Further information is requested response needed Waiting on user input before progress can be made

Comments

@phthaocse
Copy link

Hello,

Our company is currently using the latest version of OpenTelemetry Go 1.27.0. After implementing OpenTelemetry to record metrics, we noticed a significant increase in memory usage in our pods deployed on Kubernetes, leading to OoMKilled issues. Could you please provide us with any documentation or knowledge regarding how OpenTelemetry manages memory?

Thank you.

@phthaocse phthaocse added the bug Something isn't working label Jun 1, 2024
@dmathieu
Copy link
Member

This ask is rather vague.
OpenTelemetry does not "manage memory" per-se. Go manages memory.

We do have benchmarks that track allocations though.
They run on new releases, and manually on an as-needed basis in PRs.

Investigating this would require looking into what exactly is using memory within your application.
That may be due to otel (like anything, it does have a memory and cpu footprint). It could also be that you were stretched too thin in term of resources.
Without more information, I'm afraid there isn't much more we can do here.

@pellared
Copy link
Member

Could you please provide us with any documentation or knowledge regarding how OpenTelemetry manages memory?

I think it would be an overkill. You can always read the codebase.

After implementing OpenTelemetry to record metrics, we noticed a significant increase in memory usage in our pods deployed on Kubernetes, leading to OoMKilled issues.

We cannot do anything without repro steps or profiling data.

@pellared pellared added invalid This doesn't seem right question Further information is requested labels Jun 18, 2024
@MrAlias MrAlias added the response needed Waiting on user input before progress can be made label Jun 18, 2024
@yaniv-s
Copy link

yaniv-s commented Jun 25, 2024

There's definitely a problem with memory allocations/usage in 1.27
Since I upgraded from 1.24 to 1.27 my service uses more memory, this is from pprof, I hope it can help
image

@MrAlias
Copy link
Contributor

MrAlias commented Jun 25, 2024

Please provide the example code that you used to generate that graphic. I mean not aware of a function in this project called AddTagToContext. It looks like an inlining to grow is happening there. Understanding of that call sight is needed to begin addressing this.

@kellis5137
Copy link

Has anyone found a solution for this issue?

@kellis5137
Copy link

kellis5137 commented Aug 2, 2024

Just incase someone runs into this problem. I'm not 100% the exact cause, but the resource limits memory 32 megs. I think it needs to be bumped. I up'ed it and it worked. It took me a while to figure out HOW to bump the autoinstrumentation go sidecar. In your Instrumentation manifest, add a go section under the spec object:

apiVersion: opentelemetry.io/v1alpha1

kind: Instrumentation

metadata:

  name: my-instrumentation

spec:
st incase someone runs into this problem. I'm not 100% the exact cause, but the resource limits memory 32 megs. I think it needs to be bumped. I up'ed it and it worked. It took me a while to figure out HOW to bump the autoinstrumentation go sidecar. In your Instrumentation manifest,  add a go section under the spec object:

apiVersion: opentelemetry.io/v1alpha1

kind: Instrumentation

metadata:

  name: my-instrumentation

spec:

  exporter:

    endpoint: http://otel-collector:4317

  propagators:

    - tracecontext
    - baggage
    - b3

  sampler:

    type: parentbased\_traceidratio

    argument: "0.25"

  go:

    resourceRequirements:

      limits:

        cpu: <up the value if necessary>

        memory: <up the value if necessary> # I upped it to 512Mi (normally 32Mi). Going to monitor and see if I can go down

      requests:

        cpu: 5m # this is the original value as of this writing

        memory: 62Mi # I doubled the amount for the default (normally 32Mi)

    env:

      - name: OTEL\_EXPORTER\_OTLP\_ENDPOINT

        value: http://otel-collector:4318
  exporter:

    endpoint: http://otel-collector:4317

  propagators:

    - tracecontext
    - baggage
    - b3

  sampler:

    type: parentbased\_traceidratio

    argument: "0.25"

  go:

    resourceRequirements:

      limits:

        cpu: <up the value if necessary>

        memory: <up the value if necessary> # I upped it to 512Mi (normally 32Mi). Going to monitor and see if I can go down

      requests:

        cpu: 5m # this is the original value as of this writing

        memory: 62Mi # I doubled the amount for the default (normally 32Mi)

    env:

      - name: OTEL\_EXPORTER\_OTLP\_ENDPOINT

        value: http://otel-collector:4318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working invalid This doesn't seem right question Further information is requested response needed Waiting on user input before progress can be made
Projects
None yet
Development

No branches or pull requests

6 participants