-
Notifications
You must be signed in to change notification settings - Fork 272
Description
What are you trying to achieve?
Follow recommendations on: https://opentelemetry.io/docs/specs/semconv/system/hardware-metrics/
What did you expect to see?
Consistent and implementable specification.
Additional context.
Ran across following issues when trying to map GPU metrics to the semantics...
Non-implementable / unspecified items:
- Single
firmware_version
won't work:- Devices have multiple types of firmware (display, media, scheduling/power management etc)
hw.errors
have justhw.error.type
attribute, although:- Errors can be correctable, uncorrectable (data lost), or fatal (functionality lost, at least reset needed to recover)
- Errors can originate from different parts of the SW stack (FW, kernel, userspace driver)
- Errors can originate from different parts of the device HW (display, media, compute, 3d etc)
- => I suggest adding
.category
attribute, similarly to Level-Zero spec:
Inconsistencies:
hw.gpu.power
vs.hw.power{hw.type="gpu"}
confusion- If both are valid, why there's no
hw.gpu.energy
to matchhw.energy{hw.type="gpu"}
?
- If both are valid, why there's no
- Common HW
name
&id
attributes vs. GPUmodel
&serial
attributes- Should all of these be provided despite overlap?
- Why
vendor
attribute is used for GPU devices, butmanufacturer
for (host) device: https://opentelemetry.io/docs/specs/semconv/resource/device/ - Inconsistent attribute examples for GPU metrics missing from spec:
system.cpu.frequency
vs.hw.cpu.speed
.utilization
vs..*_ratio
[1] suffix for things like (frequency) throttling- whether in addition to base metric, one should provide
.utilization
/.*_ratio
, or.limit
value?
[1] Used e.g. in: https://opentelemetry.io/docs/specs/semconv/system/hardware-metrics/#hwfan---fan-metrics
PS. Summing errors is not very meaningful (rate is more interesting), but maybe additional all
category could be provided just for indicating whether there are any errors (within query period) from given HW? It could be useful both when more fine-grained categories are missing, and/or in addition to them.