Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate adding metrics for virtual threads #9533

Open
edbratt opened this issue Nov 26, 2024 · 7 comments · May be fixed by #9619
Open

Evaluate adding metrics for virtual threads #9533

edbratt opened this issue Nov 26, 2024 · 7 comments · May be fixed by #9619
Assignees
Labels

Comments

@edbratt
Copy link
Member

edbratt commented Nov 26, 2024

Environment Details

  • Helidon Version: 4.1.4
  • Helidon SE and MP
  • JDK version: 21+
  • OS: all
  • Docker version (if applicable): n/a

This enhancement request is for tracking generation of additional metrics that will provide Helidon users information about virtual thread usage. As an example, current metrics provided in JVM include a count of active Platform Threads. Helidon users would like a metric that provides similar count of active of in-flight virtual threads. Users might also benefit from a runnig tally of current pinned threads.

@edbratt
Copy link
Member Author

edbratt commented Nov 26, 2024

It looks like Loom EA builds include jcmd <pid> Thread.vthread_summary that prints a summary of the scheduler, timers, and I/O pollers. While this is evolving, we might want to look at this. More at https://bugs.openjdk.org/browse/JDK-8339420

@tjquinno
Copy link
Member

Another possibility would be for Helidon to add a JFR RecordingStream implementation which works with RecordingEvent instances describing events related to virtual threads (start, stop, pin).

We'd need to do some investigation to understand whether these are instance or duration events, what users would need to do to get this information (would there be a separate component users would add or would this be part of the current system-provided metrics component), etc.

It might also be feasible to allow users to configure which JRF events Helidon would publish as metrics, as RecordingStreams can register events they are interested by name. Although this would obviously be additional work for us initially it might nice for users and could be a way for us to avoid having to respond to a series of subsequent requests to add support for particular other events.

@vasanth-bhat
Copy link

Based on what we have seen , below JFR events are supported for Virtual Threads. Three of them are Instant and 1 of them is duration.

Duration Event

jdk.VirtualThreadPinned

Instant Events

jdk.VirtualThreadStart
jdk.VirtualThreadEnd
jdk.VirtualThreadSubmitFailed

But none of them give info on current carrier thread statistics, which is available from "Thread.vthread_summary" or. "Thread.vthread_scheduler"

Sample output :

Default virtual thread scheduler:
java.util.concurrent.ForkJoinPool@6ef19360[Running, parallelism = 14, size = 14, active = 4, running = 0, steals = 14531357, tasks = 0, submissions = 0]

This is a critical information , and need to be available as Metric

@tjquinno
Copy link
Member

tjquinno commented Dec 2, 2024

Let me highlight some important points.

  1. Helidon metrics could access the JFR events relatively efficiently by implementing RecordingStream.

    In contrast, the sample output you posted is just that--output from a jcmd command execution.

    It is not clear that Helidon metrics should be running a command (by necessity in a separate process) and then parsing the output (plain or JSON) to compute metric values.

  2. The use of a ForkJoinPool for virtual threads is ultimately an internal JDK implementation choice which could change between Java releases. Tying a Helidon-provided metric to a JDK implementation choice might break with future Java releases.

  3. Even if we wanted to expose data about the ForkJoinPool Java currently uses for virtual threads, AFAIK the JDK provides no programmatic way to get a reference to the specific ForkJoinPool it is using for virtual threads.

It would be great if the JDK provided an efficient way to get this information so Helidon could capture it in metrics. I am not aware of an efficient way to do that, but if there is one I would like to learn of it.

@ljnelson
Copy link
Member

ljnelson commented Dec 2, 2024

Virtual threads model tasks and are designed to scale into the millions. What would it mean to have metrics for such things, I wonder?

@vasanth-bhat
Copy link

vasanth-bhat commented Dec 2, 2024

  1. The use of a ForkJoinPool for virtual threads is ultimately an internal JDK implementation

Minor correction. The ForkJoinPool is used to implement Carrier Thread pool , which is used to execute the virtual threads. Yes, it's default implementation and. there may be ways to have alternate implementation

  1. It would be great if the JDK provided an efficient/programmatic way to get this information

There is indeed an API available in Java-24 to get this information on carrier threads from Virtual Thread Scheduler.
It's available via "jdk.management.VirtualThreadSchedulerMXBean"

  1. https://bugs.openjdk.org/browse/JDK-8339827
  2. https://download.java.net/java/early_access/jdk24/docs/api/jdk.management/jdk/management/VirtualThreadSchedulerMXBean.html

@tjquinno
Copy link
Member

tjquinno commented Dec 2, 2024

Even better, much more straightforward, and the JDK owns any compatibility issues if the underlying implementation changes at some point. My very quick web searching didn't turn up the VirtualThreadSchedulerMXBean, and I guess I narrowed my thinking when I saw you had pasted output from jcmd.

As you might know, accessing MXBeans is how Helidon already exposes various JVM values as metrics. The values exposed by VirtualThreadSchedulerMXBean would be good candidates to add to that list.

Obviously this would require JDK 24 to work. Helidon 4.x has to work with JDK 21 (or later). For a number of reasons we try to avoid dynamic class searching (e.g. Class.forName) and similar techniques to decide what's on the class path at runtime, so I'm not sure this feature would be likely to appear in the main Helidon 4.x codebase.

There might be some other options, though. I need to experiment a bit with those but if I find anything useful I'll share it here.

@m0mus m0mus added metrics P3 4.x Version 4.x labels Dec 2, 2024
@m0mus m0mus moved this from Triage to Sprint Scope in Backlog Dec 2, 2024
@tjquinno tjquinno linked a pull request Dec 23, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Sprint Scope
Development

Successfully merging a pull request may close this issue.

5 participants