Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce coroutine dump format for better visualisation of structured concurrency #3587

Open
mvicsokolova opened this issue Jan 16, 2023 · 5 comments
Assignees

Comments

@mvicsokolova
Copy link
Contributor

Introduction

Currently kotlinx-coroutines-debug provides the following API for dumping coroutines:

  • DebugProbes.dumpCoroutines prints all active coroutines, including their state, creation and suspension stacktraces. Though the output of this method is similar to jstack , it only presents a flat list of coroutines, that is rather difficult to analyse in case of a large number of coroutines.
  • DebugProbes.dumpCoroutinesInfo provides a list of objects with information about active coroutines.
  • DebugProbes.printJob / DebugProbes.printScope prints isolated parts of coroutines hierarchy referenced by a Job or CoroutineScope instances. Though these methods can not provide suspension stacktrace.

For now, there is no coroutine dump that would group coroutines in a meaningful way visualising relationships of structured concurrency and providing a stack trace. This proposal aims to come up with a more suitable dump format.

Proposed solution:

DebugProbes.dumpCoroutines can provide coroutine dump in JSON format in addition to plain text:

DebugProbes.dumpCoroutines(System.out, CoroutineDumpFormat.JSON)

Pros:

Cons:

  • Currently there are no tools in IDE to visualise this dump format and it is unreadable when it’s just printed to the terminal. Though you can paste it into the scratch file in IDE and collapse/expand objects corresponding to coroutines.

Possible alternatives:

Plain text + indents

We can still print coroutine dump as a plain text, but coroutines may be sorted by their parent and indents will show their relationships.

Example output:

- BlockingCoroutine{Completing}@2a5c959b [BlockingEventLoop@1d80b9da]
    - StandaloneCoroutine{Active}@154d1536, state: SUSPENDED, name: 'root rb 0' [BlockingEventLoop@1d80b9da]
        at kotlinx.coroutines.DelayKt.awaitCancellation(Delay.kt:148)
        at com.intellij.internal.TestCoroutineProgressAction$cancellableBGProgress$1$1$2$2.invokeSuspend(TestCoroutineProgressAction.kt:91)
    - StandaloneCoroutine{Completing}@1b7e4b1f, name: 'root rb 1' [BlockingEventLoop@1d80b9da]
        - StandaloneCoroutine{Active}@56b4c42, state: SUSPENDED, name: 'root rb 1:0' [BlockingEventLoop@1d80b9da]
            at kotlinx.coroutines.DelayKt.awaitCancellation(Delay.kt:148)
            at com.intellij.internal.TestCoroutineProgressAction$cancellableBGProgress$1$1$2$3$1.invokeSuspend(TestCoroutineProgressAction.kt:95)
    - StandaloneCoroutine{Active}@491608fb, state: SUSPENDED, name: 'root rb 2' [BlockingEventLoop@1d80b9da]
        at kotlinx.coroutines.DelayKt.awaitCancellation(Delay.kt:148)
        at com.intellij.internal.TestCoroutineProgressAction$cancellableBGProgress$1$1$2$4.invokeSuspend(TestCoroutineProgressAction.kt:102)
        - StandaloneCoroutine{Active}@2d37afba, state: SUSPENDED, name: 'root rb 2:0' [BlockingEventLoop@1d80b9da]
            at kotlinx.coroutines.DelayKt.awaitCancellation(Delay.kt:148)
            at com.intellij.internal.TestCoroutineProgressAction$cancellableBGProgress$1$1$2$4$1.invokeSuspend(TestCoroutineProgressAction.kt:100)

Pros:

  • A little bit better than just a plain thread dump, though still unreadable in case of a large number of coroutines.
@fvasco
Copy link
Contributor

fvasco commented Jan 17, 2023

Your question looks like a false dichotomy.
May we improve the current text format now and introduce new ones in the future?

In your example, CoroutineDumpFormat.JSON should not be an enumeration, but an implementation of an interface (similar to Dispather.IO), in such case we don't have to define and provide now a dump "JEP 425 compatible", and we leave open the door for third-party implementations (text+ASCII color, YAML format, binary format...).

@mvicsokolova
Copy link
Contributor Author

Thank you for your comment! That’s true, just improving text format of the dump is a possible option.

Do you have any suggestions on what may make the text format more readable and convenient for troubleshooting?

@fvasco
Copy link
Contributor

fvasco commented Jan 18, 2023

The main (ignored) issue for me is the duplication on a large dump.
The coroutine's main goal is the ability to run thousands (or millions) of coroutines, unfortunately reading thousands (or millions) of stack traces is really demanding.
Moreover, many of these are identical and are duplicated many times.

See this trivial example:

Coroutines dump 2023/01/18 09:52:01

Coroutine BlockingCoroutine{Active}@68ff4cf8, state: RUNNING
	at java.base/java.lang.Thread.getStackTrace(Thread.java:1602)
	at kotlinx.coroutines.debug.internal.DebugProbesImpl.enhanceStackTraceWithThreadDumpImpl(DebugProbesImpl.kt:339)
	at kotlinx.coroutines.debug.internal.DebugProbesImpl.dumpCoroutinesSynchronized(DebugProbesImpl.kt:294)
	at kotlinx.coroutines.debug.internal.DebugProbesImpl.dumpCoroutines(DebugProbesImpl.kt:266)
	at kotlinx.coroutines.debug.DebugProbes.dumpCoroutines(DebugProbes.kt:144)
	at kotlinx.coroutines.debug.DebugProbes.dumpCoroutines$default(DebugProbes.kt:144)
	at MainKt$main$1.invokeSuspend(Main.kt:21)
	(Coroutine creation stacktrace)
	at kotlin.coroutines.intrinsics.IntrinsicsKt__IntrinsicsJvmKt.createCoroutineUnintercepted(IntrinsicsJvm.kt:122)
	at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:58)
	at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
	at MainKt.main(Main.kt:14)
	at MainKt.main(Main.kt)

Coroutine StandaloneCoroutine{Active}@2fd9881b, state: SUSPENDED
	at MainKt$main$1$1$1.invokeSuspend(Main.kt:18)
	(Coroutine creation stacktrace)
	at kotlin.coroutines.intrinsics.IntrinsicsKt__IntrinsicsJvmKt.createCoroutineUnintercepted(IntrinsicsJvm.kt:122)
	at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
	at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch$default(Builders.common.kt:47)
	at kotlinx.coroutines.BuildersKt.launch$default(Unknown Source)
	at MainKt$main$1.invokeSuspend(Main.kt:17)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

Coroutine StandaloneCoroutine{Active}@70944db3, state: SUSPENDED
	at MainKt$main$1$1$1.invokeSuspend(Main.kt:18)
	(Coroutine creation stacktrace)
	at kotlin.coroutines.intrinsics.IntrinsicsKt__IntrinsicsJvmKt.createCoroutineUnintercepted(IntrinsicsJvm.kt:122)
	at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
	at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch$default(Builders.common.kt:47)
	at kotlinx.coroutines.BuildersKt.launch$default(Unknown Source)
	at MainKt$main$1.invokeSuspend(Main.kt:17)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

Coroutine StandaloneCoroutine{Active}@276103c5, state: SUSPENDED
	at MainKt$main$1$1$1.invokeSuspend(Main.kt:18)
	(Coroutine creation stacktrace)
	at kotlin.coroutines.intrinsics.IntrinsicsKt__IntrinsicsJvmKt.createCoroutineUnintercepted(IntrinsicsJvm.kt:122)
	at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
	at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch$default(Builders.common.kt:47)
	at kotlinx.coroutines.BuildersKt.launch$default(Unknown Source)
	at MainKt$main$1.invokeSuspend(Main.kt:17)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

Collapsing coroutines with the same stack is really useful, the dump is shorter and more readable.

Coroutines dump 2023/01/18 09:52:01

Coroutine BlockingCoroutine{Active}@68ff4cf8, state: RUNNING
	at java.base/java.lang.Thread.getStackTrace(Thread.java:1602)
	at kotlinx.coroutines.debug.internal.DebugProbesImpl.enhanceStackTraceWithThreadDumpImpl(DebugProbesImpl.kt:339)
	at kotlinx.coroutines.debug.internal.DebugProbesImpl.dumpCoroutinesSynchronized(DebugProbesImpl.kt:294)
	at kotlinx.coroutines.debug.internal.DebugProbesImpl.dumpCoroutines(DebugProbesImpl.kt:266)
	at kotlinx.coroutines.debug.DebugProbes.dumpCoroutines(DebugProbes.kt:144)
	at kotlinx.coroutines.debug.DebugProbes.dumpCoroutines$default(DebugProbes.kt:144)
	at MainKt$main$1.invokeSuspend(Main.kt:21)
	(Coroutine creation stacktrace)
	at kotlin.coroutines.intrinsics.IntrinsicsKt__IntrinsicsJvmKt.createCoroutineUnintercepted(IntrinsicsJvm.kt:122)
	at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:58)
	at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
	at MainKt.main(Main.kt:14)
	at MainKt.main(Main.kt)

Coroutine StandaloneCoroutine{Active}@2fd9881b, state: SUSPENDED, and
Coroutine StandaloneCoroutine{Active}@70944db3, state: SUSPENDED, and
Coroutine StandaloneCoroutine{Active}@276103c5, state: SUSPENDED
	at MainKt$main$1$1$1.invokeSuspend(Main.kt:18)
	(Coroutine creation stacktrace)
	at kotlin.coroutines.intrinsics.IntrinsicsKt__IntrinsicsJvmKt.createCoroutineUnintercepted(IntrinsicsJvm.kt:122)
	at kotlinx.coroutines.intrinsics.CancellableKt.startCoroutineCancellable(Cancellable.kt:30)
	at kotlinx.coroutines.BuildersKt__Builders_commonKt.launch$default(Builders.common.kt:47)
	at kotlinx.coroutines.BuildersKt.launch$default(Unknown Source)
	at MainKt$main$1.invokeSuspend(Main.kt:17)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

Other issues regarding the size of the dump: is it a good idea to dump some megabytes of data into the log file?
Should a developer copy-paste the relevant data from the log? Is it a reasonable practice?
Looking at JEP 425 is a good idea, but I cannot exclude that developer generate a dump in a separate file instead to write it into the log (like a core dump), so a JSON, YAML, or a binary exporter (JSON+GZIP, ProtoBuf+GZIP) can be useful. So my previous comment: making a dump pluggable (like serialization) should avoid putting too much logic in this library.
For this use case, also integration with JFR can be helpful, but maybe this is another task.

@dkhalanskyjb
Copy link
Collaborator

From https://openjdk.org/jeps/425:

The thread dump is another popular tool for troubleshooting applications written in the thread-per-request style. Unfortunately, the JDK's traditional thread dump, obtained with jstack or jcmd, presents a flat list of threads. This is suitable for dozens or hundreds of platform threads, but is unsuitable for thousands or millions of virtual threads. Accordingly, we will not extend traditional thread dumps to include virtual threads, but will rather introduce a new kind of thread dump in jcmd to present virtual threads alongside platform threads, all grouped in a meaningful way. Richer relationships among threads can be shown when programs use structured concurrency.

Maybe it makes sense to look at what jcmd does or even repeat their format for better tooling compatibility. For example, they include the structured concurrency relationships in the output.

@mvicsokolova
Copy link
Contributor Author

mvicsokolova commented Feb 1, 2023

For now, jcmd supports only JSON as a better visualisation format for lots of virtual threads.
(Check out the jcmd thread dump demo).

For example, here is the structured thread dump: root StructuredTaskScope and 2 child scopes that run 1000 virtual threads each (stacktraces and the list of threads are collapsed).

jcmd pid Thread.dump_to_file -format=json dump.json

Screenshot 2023-02-01 at 18 42 25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants