Add process.cpu.count metric #2392

trask · 2022-03-01T23:35:38Z

The motivation for this PR is to find the right place to capture a metric for Java's Runtime.availableProcessor(), in a way that is language-neutral, rather than throwing it under process.runtime.jvm.*.

See initial attempt at #2384, but as @bogdandrutu pointed out and I confirmed via testing, Runtime.availableProcessor() can be less than the system CPU count, e.g. by launching the process via taskset.

Changes

Adds a new metric process.cpu.count to capture the number of CPUs available to the process.

Two open questions:

type: Gauge vs Async UpDownCounter
name: process.cpu.count vs process.cpu.available

My initial thoughts on these questions:

type: Async UpDownCounter since it's counting # of CPUs.
name: process.cpu.count seems simpler, and available sounds like availability, but I don't have strong feelings about this.

trask · 2022-03-02T18:05:32Z

@tigrannajaryan @bogdandrutu related to #2384, is it ok for Java process itself to report process.cpu.time and process.cpu.count? or should we define new metrics under process.runtime.jvm.* if we want to report those from inside the Java process?

bogdandrutu · 2022-03-02T19:12:56Z

@trask would be good to see if we have more cases like this. For example in a k8s environment, do we consider the cpu_limit as being equivalent with this? Are there any other languages with similar capability?

trask · 2022-03-02T20:09:26Z

@trask would be good to see if we have more cases like this

would it be better for us to define process.runtime.jvm.cpu.time and process.runtime.jvm.cpu.count for now then? we can always migrate to process.cpu.time and process.cpu.count in the future via schema mapping

bogdandrutu · 2022-03-03T05:18:04Z

If we cannot find any other use-cases the jvm versions are the answer, but was looking to see if other maintainers have some input here.

trask · 2022-03-03T19:53:47Z

looking to see if other maintainers have some input here

@open-telemetry/dotnet-approvers @open-telemetry/go-approvers do the .NET or Go runtimes have the equivalent of process.cpu.time (and maybe process.cpu.count)?

e.g. similar to the JVM's

specification/metrics/semantic_conventions/process-metrics.md

pellared · 2022-03-03T22:11:49Z

looking to see if other maintainers have some input here

@open-telemetry/dotnet-approvers @open-telemetry/go-approvers do the .NET or Go runtimes have the equivalent of process.cpu.time (and maybe process.cpu.count)?

e.g. similar to the JVM's

OperatingSystemMXBean#getProcessCpuTime()

Runtime#availableProcessors()

.NET:

process.cpu.time - could be done e.g. via DateTime.UtcNow - Process.GetCurrentProcess().StartTime.ToUniversalTime()
process.cpu.count - Environment.ProcessorCount

Go:

process.cpu.time - AFAIK nothing OOTB. But something like cpu.Times from https://github.com/shirou/gopsutil could be implemented/used. or some time.Now in initialized in a global variable and calculating the interval using time.Since if the precision is not very important
process.cpu.count - runtime.NumCPU

carlosalberto · 2022-03-03T23:43:20Z

From what @pellared posted, DotNet says that

The value returned by this API is fixed at .NET runtime startup for the process lifetime. It does not reflect changes in the environment settings while the process is running.

And for Go:

The set of available CPUs is checked by querying the operating system at process startup. Changes to operating system CPU allocation after process startup are not reflected.

Which means, at least these specific values, cannot be used for metrics purpose, as they don't change.

Aneurysm9 · 2022-03-04T00:17:12Z

Go's host instrumentation does have process.cpu.time, which is obtained from the gopsutil/process package @pellared mentioned.

Co-authored-by: Robert Pająk <pellared@hotmail.com>

trask · 2022-03-04T01:02:39Z

@bogdandrutu do you need more maintainer feedback?

It looks like Go is already capturing process.cpu.time, and .NET is discussing that they would like to as well: open-telemetry/opentelemetry-dotnet-contrib#207 (comment)

I'd like to propose that we move forward with this PR, and with allowing language-based instrumentation to emit process.* metrics for their own process.

carlosalberto · 2022-03-08T00:08:47Z

Ping @bogdandrutu

carlosalberto · 2022-03-10T00:01:32Z

Hey @trask -

Bogdan is on holidays this week. Shall we come back to the discussion next week?

trask · 2022-03-10T00:06:45Z

no problem

carlosalberto · 2022-03-17T01:53:13Z

Ping @bogdandrutu

specification/metrics/semantic_conventions/process-metrics.md

reyang · 2022-03-18T06:34:27Z

With mobile devices moving towards to the asymmetric multi-core processor, do we consider a dimension which could tell whether it is a Performance vs. Efficiency core? Probably not an interesting topic for service developers, but definitely interesting for device/mobile.

Co-authored-by: Reiley Yang <reyang@microsoft.com>

bogdandrutu

I think it is fine to report a metric like this, but was curios if we need count, or "utilization" to be consistent with system.

bogdandrutu · 2022-03-22T10:01:29Z

specification/metrics/semantic_conventions/process-metrics.md

@@ -30,6 +30,7 @@ Below is a table of Process metric instruments.
 | Name | Instrument | Units | Description | Labels |
 |------|------------|-------|-------------|--------|
 | `process.cpu.time` | Asynchronous Counter | s | Total CPU seconds broken down by different states. | `state`, if specified, SHOULD be one of: `system`, `user`, `wait`. A process SHOULD be characterized _either_ by data points with no `state` labels, _or only_ data points with `state` labels. |
+| `process.cpu.count` | Asynchronous UpDownCounter | 1 | The number of logical CPUs available to the process. |  |


For consistency with system metrics, should we report proess.cpu.utilization instead? See https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/system-metrics.md#metric-instruments

I agree that utilization is what users generally want to see. But utilization collected client-side is just a gauge and can't be aggregated over time, so I think it's nice to capture process.cpu.time and process.cpu.count where possible and display utilization based on those two metrics.

We had the same debate for system.cpu.utilization and the result of the discussion is that system.cpu.utilization was simpler for some backends than calculating from two metrics.

But utilization collected client-side is just a gauge and can't be aggregated over time

Not sure this is true, since if you calculate "delta utilization" if process.cpu.count does not change, you can correctly merge them by doing sum:

Timestamp 0 -> cpu.time0/count
Timestamp 1 -> cpu.time1/count -> report (cpu.time 1 - cpu.time 0) / count / (Timestamp 1 - Timestamp 0)
Timestamp 2 -> cpu.time1/count -> report (cpu.time 2 - cpu.time 1) / count / (Timestamp 2 - Timestamp 1) -> percent of total cpu / second.

The idea was that if we report roughly every same interval (also we have start and current time) you can average them two reported value to get the utilization over (Timestamp 2 - Timestamp 0).

If the argument is wrong for the system.cpu.utilization we should change that as well, I want consistency :)

Interesting. I don't have any objection to that approach. From a JVM metrics perspective, there was some interest in capturing available cpu count separate from utilization because it gives clues about GC and common thread pool sizing, but we haven't mapped out GC or thread pool metrics yet, so will revisit that then if/when we have a more specific need. I'll send a new PR to propose adding process.cpu.utilization, with this definition of process.cpu.time divided by elapsed time divided by "available processor count"

trask · 2022-03-24T01:28:19Z

Based on discussion with @bogdandrutu above, I have created #2436 instead.

Closing this, will revisit if/when we have a specific need for it.

Add process cpu count metric

4b4d148

trask force-pushed the add-process-cpu-count branch from 9060c4b to 4b4d148 Compare March 1, 2022 23:36

trask marked this pull request as ready for review March 1, 2022 23:36

trask requested review from a team March 1, 2022 23:36

github-actions bot assigned carlosalberto Mar 1, 2022

arminru added area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory labels Mar 2, 2022

This was referenced Mar 2, 2022

Add semantic conventions for jvm cpu metrics #2292

Merged

Clarify relationship between process and runtime semantic conventions #2291

Closed

pellared reviewed Mar 3, 2022

View reviewed changes

specification/metrics/semantic_conventions/process-metrics.md Outdated Show resolved Hide resolved

trask and others added 2 commits March 3, 2022 16:50

Merge remote-tracking branch 'upstream/main' into add-process-cpu-count

1ef1024

Update specification/metrics/semantic_conventions/process-metrics.md

57768f6

Co-authored-by: Robert Pająk <pellared@hotmail.com>

twenzel mentioned this pull request Mar 9, 2022

Add process.cpu.count open-telemetry/opentelemetry-dotnet-contrib#218

Merged

jack-berg approved these changes Mar 17, 2022

View reviewed changes

reyang reviewed Mar 18, 2022

View reviewed changes

specification/metrics/semantic_conventions/process-metrics.md Outdated Show resolved Hide resolved

trask and others added 2 commits March 18, 2022 13:15

Update specification/metrics/semantic_conventions/process-metrics.md

a945424

Co-authored-by: Reiley Yang <reyang@microsoft.com>

Merge remote-tracking branch 'upstream/main' into add-process-cpu-count

09074b2

bogdandrutu reviewed Mar 22, 2022

View reviewed changes

trask closed this Mar 24, 2022

trask deleted the add-process-cpu-count branch March 24, 2022 01:28

This was referenced Jun 9, 2022

[Do Not Merge] Add description doc for DotNet Runtime metrics open-telemetry/opentelemetry-dotnet-contrib#404

Closed

Add GC heap size and count in Runtime metrics open-telemetry/opentelemetry-dotnet-contrib#412

Merged

jack-berg mentioned this pull request Sep 8, 2022

Add JVM implementation information and memory semantic conventions #2777

Closed

trask mentioned this pull request Oct 25, 2022

Record memory usage after garbage collection open-telemetry/opentelemetry-java-instrumentation#6963

Merged

jack-berg mentioned this pull request Apr 5, 2023

Add new JVM runtime environment metrics #3352

Closed

trask mentioned this pull request Jun 7, 2023

Add jvm.cpu.count metric open-telemetry/semantic-conventions#52

Merged

trask mentioned this pull request Jul 26, 2023

Document common metric names and when they should be used open-telemetry/semantic-conventions#211

Open

trask mentioned this pull request Jun 7, 2024

Should process.cpu.utilization and system.cpu.utilization be opt-in? open-telemetry/semantic-conventions#1130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add process.cpu.count metric #2392

Add process.cpu.count metric #2392

trask commented Mar 1, 2022 •

edited

Loading

trask commented Mar 2, 2022

bogdandrutu commented Mar 2, 2022

trask commented Mar 2, 2022

bogdandrutu commented Mar 3, 2022

trask commented Mar 3, 2022

pellared commented Mar 3, 2022 •

edited

Loading

carlosalberto commented Mar 3, 2022

Aneurysm9 commented Mar 4, 2022

trask commented Mar 4, 2022

carlosalberto commented Mar 8, 2022

carlosalberto commented Mar 10, 2022

trask commented Mar 10, 2022

carlosalberto commented Mar 17, 2022

reyang commented Mar 18, 2022

bogdandrutu left a comment

bogdandrutu Mar 22, 2022

trask Mar 22, 2022

bogdandrutu Mar 23, 2022 •

edited

Loading

trask Mar 24, 2022

trask commented Mar 24, 2022

Add process.cpu.count metric #2392

Add process.cpu.count metric #2392

Conversation

trask commented Mar 1, 2022 • edited Loading

Changes

trask commented Mar 2, 2022

bogdandrutu commented Mar 2, 2022

trask commented Mar 2, 2022

bogdandrutu commented Mar 3, 2022

trask commented Mar 3, 2022

pellared commented Mar 3, 2022 • edited Loading

carlosalberto commented Mar 3, 2022

Aneurysm9 commented Mar 4, 2022

trask commented Mar 4, 2022

carlosalberto commented Mar 8, 2022

carlosalberto commented Mar 10, 2022

trask commented Mar 10, 2022

carlosalberto commented Mar 17, 2022

reyang commented Mar 18, 2022

bogdandrutu left a comment

Choose a reason for hiding this comment

bogdandrutu Mar 22, 2022

Choose a reason for hiding this comment

trask Mar 22, 2022

Choose a reason for hiding this comment

bogdandrutu Mar 23, 2022 • edited Loading

Choose a reason for hiding this comment

trask Mar 24, 2022

Choose a reason for hiding this comment

trask commented Mar 24, 2022

trask commented Mar 1, 2022 •

edited

Loading

pellared commented Mar 3, 2022 •

edited

Loading

bogdandrutu Mar 23, 2022 •

edited

Loading