[Exporter/LoadBalncer] Increased Memory Utilization after bumping from 0.94.0 to 0.99.0 #33435

NickAnge · 2024-06-07T17:14:37Z

Component(s)

exporter/loadbalancing

What happened?

Description

Hello team.

We recently upgraded our internal collectors from version 0.94.0 to 0.99.0, and we observed a rise in memory usage at the load balancer deployment collectors, as depicted in the image below. This persisted even after updating to the latest version, 0.101.0.

We enabled profiling to our collectors (pprof ) component observed inuse_memory and inuse_objects. I seperated by investigation between 3 pods with low, medium and high memory usage.

Inuse Memory - Top

Low Memory Usage Pod

Medium Memory Usage Pod

High Memory Usage Pod

Inuse_objects - top

Low Memory Usage Pod

Medium Memory Usage Pod

High Memory Usage Pod

Steps to Reproduce

Deployment mode used as Load Balancer with version 0.94.0
Bump the version to 0.101.0

Expected Result

Expected result was the memory to remain the same over time, after the bump of the version

Actual Result

High memory usage after bumping the version

Collector version

0.101.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        max_recv_msg_size_mib: 20

processors:
  memory_limiter:
    check_interval: 1s
    limit_percentage: 95
    spike_limit_percentage: 15
  k8sattributes:
    passthrough: true

exporters:
  loadbalancing/spans:
    protocol:
      otlp:
        sending_queue:
          enabled: true
          num_consumers: 100
          queue_size: 500
        retry_on_failure:
          enabled: true
          initial_interval: 2s
          max_interval: 2s
          max_elapsed_time: 10s
        tls:
          insecure: true
        timeout: 1
    resolver:
      k8s:
        service: service
  loadbalancing/metrics:
    routing_key: metric
    protocol:
      otlp:
        sending_queue:
          enabled: true
          num_consumers: 50
          queue_size: 500
        retry_on_failure:
          enabled: true
          initial_interval: 2s
          max_interval: 2s
          max_elapsed_time: 10s
        tls:
          insecure: true
        timeout: 1
    resolver:
      k8s:
        service: service

extensions:
  health_check:
  pprof:
    endpoint: :1777

service:
  extensions: [ health_check , pprof]
  pipelines:
    traces:
      receivers: [ otlp ]
      processors: [ memory_limiter ]
      exporters: [ loadbalancing/spans ]
    logs:
      receivers: [ otlp ]
      processors: [ memory_limiter ]
      exporters: [ loadbalancing/spans ]
    metrics:
      receivers: [ otlp ]
      processors: [ memory_limiter, k8sattributes ]
      exporters: [ loadbalancing/metrics ]

Log output

No response

Additional context

No response

github-actions · 2024-06-07T17:14:54Z

Pinging code owners:

exporter/loadbalancing: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jpkrohling · 2024-06-10T14:23:23Z

Thank you for the detailed report, I'll take a look and try to reproduce it. In the meantime, can you try switching to the DNS resolver instead of the k8s resolver? I'm not 100% sure yet it would show a difference, but the DNS resolver is known to consume fewer resources in other situations.

    resolver:
      k8s:
        service: service

NickAnge · 2024-06-11T07:58:22Z

Thanks @jpkrohling .
We have discussed internally the replacement of the K8s resolver with dns resolver. The conclusion was to stay with K8s resolver as it is faster into computing/resolve the endpoints of the backing collectors in case of rollout or outage.

Let me know if you need me to provide some more information about the issue, and thanks a lot for taking a look

jpkrohling · 2024-06-11T12:18:33Z

Can you temporarily replace it, and see if the memory profile is different? If we can isolate this behavior to this resolver specifically, it's easier to find a solution.

NickAnge · 2024-06-11T16:18:42Z

This memory issue happened to our production environments only (probably because of higher traffic), so I am not sure if we can change it there even if it is temporarily :/. Did you manage to reproduce at your setup ?

jpkrohling · 2024-06-19T11:57:10Z

I wasn't able to try it out. I might be able to find some time later this week, but next week I'm AFK again. If anyone is interested in this issue, it would help me a lot if I can have a confirmation that this is isolated to the k8s resolver.

github-actions · 2024-08-19T03:32:32Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/loadbalancing: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dmedinag · 2024-09-12T05:26:54Z

just pinging here the owner of exporter/loadbalancing: @jpkrohling to avoid having this issue stale

github-actions · 2024-11-12T03:32:28Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/loadbalancing: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jpkrohling · 2024-12-04T11:09:44Z

I believe this has been addressed by #36505 . Feel free if you are still experiencing this when using the k8s resolver.

NickAnge added bug Something isn't working needs triage New item requiring triage labels Jun 7, 2024

github-actions bot added the exporter/loadbalancing label Jun 7, 2024

This was referenced Jun 10, 2024

Weekly Report: 2024-06-03 - 2024-06-10 LucaLanziani/opentelemetry-collector-contrib#6

Closed

Weekly Report: 2024-06-03 - 2024-06-10 LucaLanziani/opentelemetry-collector-contrib#7

Closed

This was referenced Jun 10, 2024

Weekly Report: 2024-06-03 - 2024-06-10 LucaLanziani/opentelemetry-collector-contrib#8

Closed

Weekly Report: 2024-06-03 - 2024-06-10 LucaLanziani/opentelemetry-collector-contrib#9

Closed

github-actions bot mentioned this issue Jul 2, 2024

Weekly Report: 2024-06-25 - 2024-07-02 #33839

Closed

github-actions bot mentioned this issue Jul 9, 2024

Weekly Report: 2024-07-02 - 2024-07-09 #33962

Closed

This was referenced Jul 16, 2024

Weekly Report: 2024-07-09 - 2024-07-16 #34087

Closed

Weekly Report: 2024-07-16 - 2024-07-23 #34202

Closed

This was referenced Jul 30, 2024

Weekly Report: 2024-07-23 - 2024-07-30 #34301

Closed

Weekly Report: 2024-07-30 - 2024-08-06 #34410

Closed

github-actions bot mentioned this issue Aug 13, 2024

Weekly Report: 2024-08-06 - 2024-08-13 #34626

Closed

github-actions bot added the Stale label Aug 19, 2024

jpkrohling removed the Stale label Aug 19, 2024

jpkrohling self-assigned this Aug 19, 2024

jpkrohling removed the needs triage New item requiring triage label Aug 19, 2024

github-actions bot added the Stale label Nov 12, 2024

jpkrohling removed the Stale label Dec 4, 2024

jpkrohling closed this as completed Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Exporter/LoadBalncer] Increased Memory Utilization after bumping from 0.94.0 to 0.99.0 #33435

[Exporter/LoadBalncer] Increased Memory Utilization after bumping from 0.94.0 to 0.99.0 #33435

NickAnge commented Jun 7, 2024 •

edited

Loading

github-actions bot commented Jun 7, 2024

jpkrohling commented Jun 10, 2024

NickAnge commented Jun 11, 2024

jpkrohling commented Jun 11, 2024

NickAnge commented Jun 11, 2024

jpkrohling commented Jun 19, 2024

github-actions bot commented Aug 19, 2024

dmedinag commented Sep 12, 2024

github-actions bot commented Nov 12, 2024

jpkrohling commented Dec 4, 2024 •

edited

Loading

[Exporter/LoadBalncer] Increased Memory Utilization after bumping from 0.94.0 to 0.99.0 #33435

[Exporter/LoadBalncer] Increased Memory Utilization after bumping from 0.94.0 to 0.99.0 #33435

Comments

NickAnge commented Jun 7, 2024 • edited Loading

Component(s)

What happened?

Description

Inuse Memory - Top

Low Memory Usage Pod

Medium Memory Usage Pod

High Memory Usage Pod

Inuse_objects - top

Low Memory Usage Pod

Medium Memory Usage Pod

High Memory Usage Pod

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Jun 7, 2024

jpkrohling commented Jun 10, 2024

NickAnge commented Jun 11, 2024

jpkrohling commented Jun 11, 2024

NickAnge commented Jun 11, 2024

jpkrohling commented Jun 19, 2024

github-actions bot commented Aug 19, 2024

dmedinag commented Sep 12, 2024

github-actions bot commented Nov 12, 2024

jpkrohling commented Dec 4, 2024 • edited Loading

NickAnge commented Jun 7, 2024 •

edited

Loading

jpkrohling commented Dec 4, 2024 •

edited

Loading