Skip to content
This repository has been archived by the owner on Dec 23, 2023. It is now read-only.

CPU Hovers at 20-30% after OpenCensus has been added to a GCP AppEngine Flex service #1099

Closed
gkohen opened this issue Mar 27, 2018 · 23 comments
Assignees
Labels

Comments

@gkohen
Copy link

gkohen commented Mar 27, 2018

Please answer these questions before submitting a bug report.

What version of OpenCensus are you using?

0.12.2

What JVM are you using (java -version)?

8

What did you do?

If possible, provide a recipe for reproducing the error.
Include OpenCensus library in our build.gradle:

    compile "io.opencensus:opencensus-api:0.12.2"
    compile "io.opencensus:opencensus-exporter-trace-stackdriver:0.12.2"
    runtime "io.opencensus:opencensus-impl:0.12.2"

In Spring Boot using GCP AppEngine Flexible, we've defined a class to initialize the stackdriver exporter.

package com.jda.starter.utils.tracing;

import com.jda.starter.Application;

import io.opencensus.exporter.trace.stackdriver.StackdriverTraceConfiguration;
import io.opencensus.exporter.trace.stackdriver.StackdriverTraceExporter;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.annotation.Profile;
import org.springframework.context.event.ContextRefreshedEvent;
import org.springframework.context.event.EventListener;
import org.springframework.stereotype.Component;

import java.io.IOException;

@Component
@Profile(Application.Profiles.PRODUCTION)
public class OpenCensusExporterInitializer {

  private static final Logger LOGGER = LoggerFactory.getLogger(OpenCensusExporterInitializer.class);

  @EventListener(ContextRefreshedEvent.class)
  public void contextRefreshedEvent(ContextRefreshedEvent event) {
    initialize();
  }

  private void initialize() {
    try {
      StackdriverTraceExporter.createAndRegister(StackdriverTraceConfiguration.builder().build());
    } catch (IOException exception) {
      LOGGER.warn("Could not initialize OpenCensus export to Stackdriver");
    }
  }
}

What did you expect to see?

CPU to be close to 0 utilization when no traffic takes place.

What did you see instead?

CPU hovering between 20-30% once OpenCesus has been added to a service

@DazWilkin
Copy link

cc @mtwo @shahprit

@dinooliva
Copy link
Contributor

Yang, please take a look into this.

@dinooliva
Copy link
Contributor

We believe the issue may be with the disruptor framework - can you retry using opencensus-impl-lite and let us know whether that solves the problem?

@songy23
Copy link
Contributor

songy23 commented Mar 27, 2018

As @dinooliva said, could you please change
runtime "io.opencensus:opencensus-impl:0.12.2"
to
runtime "io.opencensus:opencensus-impl-lite:0.12.2"
and see how the CPU usage goes?

@gkohen gkohen closed this as completed Mar 27, 2018
@gkohen
Copy link
Author

gkohen commented Mar 27, 2018

Great news. After updating to the *-lite version the issue has been resolved.

@songy23
Copy link
Contributor

songy23 commented Mar 28, 2018

Great!

@sebright
Copy link
Contributor

@dinooliva @songy23 Is this a known issue with Disruptor?

@songy23
Copy link
Contributor

songy23 commented Mar 28, 2018

I think so, @bogdandrutu said he saw similar issue before on GAE. Disruptor wakes up every 100ns or so and could introduce some overhead.

@sebright
Copy link
Contributor

Do you know if this has been reported in the Disruptor issue tracker?

@songy23
Copy link
Contributor

songy23 commented Mar 28, 2018

I'm not sure but I haven't seen any related issues yet (https://github.com/LMAX-Exchange/disruptor/issues).

@sebright
Copy link
Contributor

Reopening, since there is still an issue with opencensus-impl.

@sebright sebright reopened this Mar 28, 2018
@bogdandrutu
Copy link
Contributor

It is a combination of Disruptor + AppEngine that causes this problem.

@bogdandrutu
Copy link
Contributor

@sebright sebright added the bug label Mar 30, 2018
@askingcat
Copy link

I saw the increase for a project i'm currently working on as well. This project runs on GKE and uses Java 8, so it seems not only Java 7 is affected.

After switching to opencensus-impl-lite as suggested above the CPU load was reduced to a neglible amount on an idle system.

@gkohen
Copy link
Author

gkohen commented Apr 5, 2018

OpenCensus folks, do you think the suggested configuration to move to 'lite' is a workaround or the solution to the problem?
Thanks.

@bogdandrutu
Copy link
Contributor

@gkohen I've just looked more into the issue and found that the problem is with the sleeping strategy that we use for the Disruptor thread:

I think what we need is to allow users to change the sleeping strategy as well as allow them to configure arguments for the SleepingWaitStrategy

@solatis
Copy link

solatis commented Jun 9, 2018

I just ran into this issue as well, but not using AppEngine. Using VisualVM I saw that the crulpit was the OpenCensus.Disruptor-0 thread, so that confirms the investigation into Disruptor in this issue report.

Switching to -lite addresses the issue. Using 0.14 with the stackdriver exporter.

@karlthepagan
Copy link

karlthepagan commented Jul 18, 2018

I have implemented opencensus-impl-lite (compared to opencensus-impl 0.13.2) in order to work around this on a CPU-sensitive application.

Because of the slight response time penalty I am in favor of adding the ability for an application to set a configurer. I appreciate the complexity in adding runtime configuration. I would be happy with a SPI implementation and will pursue a pull request in that style.

EDIT: To clarify, I found CPU use impacted my cloud-hosted application despite the upgraded LMAX package (release 0.12.2).

karlthepagan pushed a commit to karlthepagan/opencensus-java that referenced this issue Jul 18, 2018
karlthepagan added a commit to karlthepagan/opencensus-java that referenced this issue Jul 18, 2018
@langecode
Copy link
Contributor

I have observed the same behaviour, although not profiled it yet. Having instrumented a simple service the idle CPU usage rises from around 3 millicores to 300 millicores - with opencensus-impl 0.15.0. The undelying OS being RHEL 7 running Kubernetes and Docker.

@steveniemitz
Copy link
Contributor

to chime in, I've noticed this as well running in GKE, about 25% of our CPU time is spent in SleepingWaitStrategy.

@danielnorberg
Copy link

When running our application using opencensus-impl 0.14.0 on Java 11 we've noticed that the OpenCensus.Disruptor-0 thread is spinning and takes a whole cpu.

@bogdandrutu
Copy link
Contributor

@danielnorberg see #1599 for more discussions. @songy23 is making a snapshot release for you to try it out and let us know if the fix helped.

@bogdandrutu
Copy link
Contributor

@steveniemitz see discussion in #1599 as well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests