-
Notifications
You must be signed in to change notification settings - Fork 202
100% cpu usage by OpenCensus.Disruptor-0 #1599
Comments
@danielnorberg @steveniemitz I've made a new SNAPSHOT release which included #1605. Can you try it and see if it resolved this issue? The To access it, please add the following lines to ~/.m2/settings.xml:
Then you should be able to use |
With |
@danielnorberg good to hear that. I think we should increase from 1ms to 4 or 8ms the sleeping time. Need to run some benchmarks to prove does not affect the high qps jobs. If you can try to increase that to 4 and 8 and see the impact would be great. |
@bogdandrutu I tried increasing the sleep from 1ms to 4ms and 8ms and got ~4% and ~2% cpu usage, respectively, when idle. |
Do you think for the moment until we do a better fix 4% is acceptable? If yes I can do a PR or you can send me a PR. Thanks |
I've been running a custom build that replaced |
@danielnorberg @steveniemitz that would be another option. we can do that by having a manifest file to configure the "low-overhead" version which uses the BlockingWaitStrategy (or an environment variable). Do you prefer this? If so what do you think is better manifest or env variable? Also I will work to build a better implementation that will not require this. |
I do like the idea of being able to configure this more easily. It seems like the SleepingWaitStrategy does have some kinks in it that could be worked out though with some more tuning? It seems like even 4% is a very high overhead for this that most service owners wouldn't want. It also seems like things running in GKE (or AppEngine) are significantly more effected than things that aren't, are there specific kernel modifications there that are causing these issues? EDIT: To directly address your question, I think I did see a PR around here that allowed configuring the strategy via a ServiceLoader (and manifest). That'd probably be the best solution. |
Filed an issue on disruptor to allow us to support a better waiting strategy (this is for my long term solution) LMAX-Exchange/disruptor#246. This is mostly an FYI and to keep track of the progress here. @steveniemitz that PR has not made too much progress. Happy to start a new PR for this if you think that can be a good short term solution (maybe can still be useful after I can implement my long term solution). |
Allowing users to choose the appropriate wait strategy seems like it would be appreciated by some, although I suspect many application developers might not be familiar enough with disruptor and busy-wait/park/yield strategies, contention costs etc to be comfortable with making this choice. I guess most users will want a non-surprising default that performs well enough in most cases. Defaulting to |
to avoid excessive CPU usage reported in census-instrumentation/opencensus-java#1599
to avoid excessive CPU usage reported in census-instrumentation/opencensus-java#1599
We've recently managed to hit a very similar issue to this one at my company. We're going to try to fix by updating the opencensus library. |
Please answer these questions before submitting a bug report.
What version of OpenCensus are you using?
0.17.0
What JVM are you using (
java -version
)?java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
What did you do?
Normal opencensus use - low volume workload.
What did you expect to see?
Low cpu usage.
What did you see instead?
The
OpenCensus.Disruptor-0
thread spinning and using a whole core even when the application is otherwise idle.This manifested after we upgraded from Java 8 to Java 11.
The text was updated successfully, but these errors were encountered: