You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On several occasions I've seen the KPL not shutdown cleanly in response to the destroy() method call. This has led to orphan threads that run in a tight loop and consume 100% all CPU resources.
The following chart shows the abrupt spike in CPU usage at 8:43 when the destroy() method was called:
I captured the following output from top and noted the PIDs of the offending threads: 9264, 9265
I then ran jstat on the parent Java process and found these two threads with hex-encoded NIDs that correspond to the decimal-encoded PIDs mentioned earlier:
"kpl-daemon-0003" #15011 daemon prio=5 os_prio=0 tid=0x00007f292f29f000 nid=0x2431 runnable [0x00007f28c0741000]
java.lang.Thread.State: RUNNABLE
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
- locked <0x00000000d68e5df0> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
- locked <0x00000000d68e5df0> (a java.io.InputStreamReader)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at com.amazonaws.services.kinesis.producer.LogInputStreamReader.run(LogInputStreamReader.java:97)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x00000000d68e5ec0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"kpl-daemon-0002" #15010 daemon prio=5 os_prio=0 tid=0x00007f292f29e800 nid=0x2430 runnable [0x00007f28c0943000]
java.lang.Thread.State: RUNNABLE
at java.lang.Throwable.fillInStackTrace(Native Method)
at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
- locked <0x00000000f5388c50> (a java.io.IOException)
at java.lang.Throwable.<init>(Throwable.java:265)
at java.lang.Exception.<init>(Exception.java:66)
at java.io.IOException.<init>(IOException.java:58)
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170)
at java.io.BufferedInputStream.read(BufferedInputStream.java:336)
- locked <0x00000000d68f73b0> (a java.lang.UNIXProcess$ProcessPipeInputStream)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
- locked <0x00000000d68f73d8> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
- locked <0x00000000d68f73d8> (a java.io.InputStreamReader)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at com.amazonaws.services.kinesis.producer.LogInputStreamReader.run(LogInputStreamReader.java:97)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x00000000d68f74f0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
You can see that the second thread is presently in the exception-handling block responding to an IOException that was thrown. It's also worth noting that there were no other Kinesis-related threads running.
The CPU is pegged at 100% because the threads continually spin in this loop that will exit only when the running boolean flag is set to false:
// Since the Daemon calls destroy instead of letting the process exit normally
// the input streams coming from the process will end up truncated.
// When we know the process is shutting down we can report the exception as info
//
if (ioex.getMessage() == null || !ioex.getMessage().contains("Stream closed")) {
//
// If the message is "Stream closed" we can safely ignore it. This is probably a bug
// with the UNIXProcess#ProcessPipeInputStream that it throws the exception. There
// is no other way to detect the other side of the request being closed.
//
log.info("Received IO Exception during shutdown. This can happen, but should indicate "
+ "that the stream has been closed: {}", ioex.getMessage());
}
} else {
log.error("Caught IO Exception while reading log line", ioex);
}
}
}
The running flag is set to false only when the shutdown() method is invoked. It's possible that this could be skipped if the thread for the native-executable runner is interrupted while running the process.waitFor() method:
=== 0.12.4
==== Java
* Upgraded dependency on aws-java-sdk-core to 1.11.128, and removed version range.
* [PR #84](#84)
* [PR #106](#106)
* Use an explicit lock file to manage access to the native KPL binaries.
* [Issue #91](#91)
* [PR #92](#92)
* Log reader threads should be shut down when the native process exits.
* [Issue #93](#93)
* [PR #94](#94)
==== C++ Core
* Add support for using a thread pool, instead of a thread per request.
The thread pool model guarantees a fixed number of threads, but have issue catching up if the KPL is overloaded.
* [PR #100](#100)
* Add log messages, and statistics about sending data to Kinesis.
* Added flush statistics that record the count of events that trigger flushes of data destined for Kinesis
* Added a log message that indicates the average time it takes for a PutRecords request to be completed.
This time is recorded from the when the request is enqueued to when it is completed.
* Log a warning if the average request time rises above five times the configured flush interval.
If you see this warning normally it indicates that the KPL is having issues keeping up. The most likely
cause is to many requests being generated, and you should investigate the flush triggers to determine why flushes
are being triggered.
* [PR #102](#102)
On several occasions I've seen the KPL not shutdown cleanly in response to the
destroy()
method call. This has led to orphan threads that run in a tight loop and consume 100% all CPU resources.The following chart shows the abrupt spike in CPU usage at 8:43 when the
destroy()
method was called:I captured the following output from
top
and noted the PIDs of the offending threads: 9264, 9265I then ran
jstat
on the parent Java process and found these two threads with hex-encoded NIDs that correspond to the decimal-encoded PIDs mentioned earlier:You can see that the second thread is presently in the exception-handling block responding to an
IOException
that was thrown. It's also worth noting that there were no other Kinesis-related threads running.The CPU is pegged at 100% because the threads continually spin in this loop that will exit only when the
running
boolean flag is set tofalse
:amazon-kinesis-producer/java/amazon-kinesis-producer/src/main/java/com/amazonaws/services/kinesis/producer/LogInputStreamReader.java
Lines 94 to 133 in cbbab06
The
running
flag is set to false only when theshutdown()
method is invoked. It's possible that this could be skipped if the thread for the native-executable runner is interrupted while running theprocess.waitFor()
method:amazon-kinesis-producer/java/amazon-kinesis-producer/src/main/java/com/amazonaws/services/kinesis/producer/Daemon.java
Lines 469 to 473 in cbbab06
The text was updated successfully, but these errors were encountered: