-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add process.cpu.count metric #2392
Conversation
9060c4b
to
4b4d148
Compare
@tigrannajaryan @bogdandrutu related to #2384, is it ok for Java process itself to report |
@trask would be good to see if we have more cases like this. For example in a k8s environment, do we consider the cpu_limit as being equivalent with this? Are there any other languages with similar capability? |
would it be better for us to define |
If we cannot find any other use-cases the |
@open-telemetry/dotnet-approvers @open-telemetry/go-approvers do the .NET or Go runtimes have the equivalent of e.g. similar to the JVM's |
.NET:
Go:
|
From what @pellared posted, DotNet says that
And for Go:
Which means, at least these specific values, cannot be used for metrics purpose, as they don't change. |
Go's |
Co-authored-by: Robert Pająk <pellared@hotmail.com>
@bogdandrutu do you need more maintainer feedback? It looks like Go is already capturing I'd like to propose that we move forward with this PR, and with allowing language-based instrumentation to emit |
Ping @bogdandrutu |
Hey @trask - Bogdan is on holidays this week. Shall we come back to the discussion next week? |
no problem |
Ping @bogdandrutu |
With mobile devices moving towards to the asymmetric multi-core processor, do we consider a dimension which could tell whether it is a Performance vs. Efficiency core? Probably not an interesting topic for service developers, but definitely interesting for device/mobile. |
Co-authored-by: Reiley Yang <reyang@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to report a metric like this, but was curios if we need count, or "utilization" to be consistent with system.
@@ -30,6 +30,7 @@ Below is a table of Process metric instruments. | |||
| Name | Instrument | Units | Description | Labels | | |||
|------|------------|-------|-------------|--------| | |||
| `process.cpu.time` | Asynchronous Counter | s | Total CPU seconds broken down by different states. | `state`, if specified, SHOULD be one of: `system`, `user`, `wait`. A process SHOULD be characterized _either_ by data points with no `state` labels, _or only_ data points with `state` labels. | | |||
| `process.cpu.count` | Asynchronous UpDownCounter | 1 | The number of logical CPUs available to the process. | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with system
metrics, should we report proess.cpu.utilization
instead? See https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/system-metrics.md#metric-instruments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that utilization is what users generally want to see. But utilization collected client-side is just a gauge and can't be aggregated over time, so I think it's nice to capture process.cpu.time
and process.cpu.count
where possible and display utilization based on those two metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had the same debate for system.cpu.utilization
and the result of the discussion is that system.cpu.utilization
was simpler for some backends than calculating from two metrics.
But utilization collected client-side is just a gauge and can't be aggregated over time
Not sure this is true, since if you calculate "delta utilization" if process.cpu.count
does not change, you can correctly merge them by doing sum:
Timestamp 0 -> cpu.time0/count
Timestamp 1 -> cpu.time1/count -> report (cpu.time 1 - cpu.time 0) / count / (Timestamp 1 - Timestamp 0)
Timestamp 2 -> cpu.time1/count -> report (cpu.time 2 - cpu.time 1) / count / (Timestamp 2 - Timestamp 1) -> percent of total cpu / second.
The idea was that if we report roughly every same interval (also we have start and current time) you can average them two reported value to get the utilization over (Timestamp 2 - Timestamp 0).
If the argument is wrong for the system.cpu.utilization
we should change that as well, I want consistency :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I don't have any objection to that approach. From a JVM metrics perspective, there was some interest in capturing available cpu count separate from utilization because it gives clues about GC and common thread pool sizing, but we haven't mapped out GC or thread pool metrics yet, so will revisit that then if/when we have a more specific need. I'll send a new PR to propose adding process.cpu.utilization
, with this definition of process.cpu.time
divided by elapsed time divided by "available processor count"
Based on discussion with @bogdandrutu above, I have created #2436 instead. Closing this, will revisit if/when we have a specific need for it. |
The motivation for this PR is to find the right place to capture a metric for Java's
Runtime.availableProcessor()
, in a way that is language-neutral, rather than throwing it underprocess.runtime.jvm.*
.See initial attempt at #2384, but as @bogdandrutu pointed out and I confirmed via testing,
Runtime.availableProcessor()
can be less than the system CPU count, e.g. by launching the process viataskset
.Changes
Adds a new metric
process.cpu.count
to capture the number of CPUs available to the process.Two open questions:
Gauge
vs AsyncUpDownCounter
process.cpu.count
vsprocess.cpu.available
My initial thoughts on these questions:
UpDownCounter
since it's counting # of CPUs.process.cpu.count
seems simpler, andavailable
sounds like availability, but I don't have strong feelings about this.