Skip to content

Commit

Permalink
[receiver/kubeletstats] Add `k8s.{container,pod}.memory.node.utilizat…
Browse files Browse the repository at this point in the history
…ion` metrics (#33591)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Similar to
#32295
and
#33390,
this PR adds the `k8s.{container,pod}.memory.node.utilization` metrics.


**Link to tracking Issue:** <Issue number if applicable>
#27885

**Testing:** <Describe what testing was performed and which tests were
added.> Added unit test.

**Documentation:** <Describe the documentation added.> Added

### Manual testing

1. Using the following target Pod:
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: memory-demo
spec:
  containers:
  - name: memory-demo-ctr
    image: polinux/stress
    resources:
      requests:
        memory: "8070591Ki"
      limits:
        memory: "9070591Ki"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "800M", "--vm-hang", "4"]
```
2. 

![memGood](https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/11754898/fae04b30-59ca-4d70-8446-f54b5a085cf7)

On a node of 32,5G memory the 800Mb container/Pod consumes the
`0.8/32.5=0.0246...=0.025`.

---------

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
  • Loading branch information
ChrsMark committed Jul 10, 2024
1 parent 948fa91 commit e42da8b
Show file tree
Hide file tree
Showing 24 changed files with 926 additions and 47 deletions.
27 changes: 27 additions & 0 deletions .chloggen/add_utilization_k8s_node_metrics.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: kubeletstatsreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: "Add `k8s.pod.memory.node.utilization` and `k8s.container.memory.node.utilization` metrics"

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [33591]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
9 changes: 7 additions & 2 deletions receiver/kubeletstatsreceiver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,9 +218,10 @@ receivers:
- pod
```

### Collect k8s.container.cpu.node.utilization, `k8s.pod.cpu.node.utilization` as ratio of total node's capacity
### Collect `k8s.{container,pod}.{cpu,memory}.node.utilization` as ratio of total node's capacity

In order to calculate the `k8s.container.cpu.node.utilization` or `k8s.pod.cpu.node.utilization` metrics, the
In order to calculate the `k8s.container.cpu.node.utilization`, `k8s.pod.cpu.node.utilization`,
`k8s.container.memory.node.utilization` and `k8s.pod.memory.node.utilization` metrics, the
information of the node's capacity must be retrieved from the k8s API. In this, the `k8s_api_config` needs to be set.
In addition, the node name must be identified properly. The `K8S_NODE_NAME` env var can be set using the
downward API inside the collector pod spec as follows:
Expand Down Expand Up @@ -248,6 +249,10 @@ receivers:
enabled: true
k8s.pod.cpu.node.utilization:
enabled: true
k8s.container.memory.node.utilization:
enabled: true
k8s.pod.memory.node.utilization:
enabled: true
```

### Optional parameters
Expand Down
15 changes: 11 additions & 4 deletions receiver/kubeletstatsreceiver/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,10 +120,17 @@ func (cfg *Config) Unmarshal(componentParser *confmap.Conf) error {
}

func (cfg *Config) Validate() error {
if cfg.Metrics.K8sContainerCPUNodeUtilization.Enabled && cfg.NodeName == "" {
return errors.New("for k8s.container.cpu.node.utilization node setting is required. Check the readme on how to set the required setting")
} else if cfg.Metrics.K8sPodCPUNodeUtilization.Enabled && cfg.NodeName == "" {
return errors.New("for k8s.pod.cpu.node.utilization node setting is required. Check the readme on how to set the required setting")
if cfg.NodeName == "" {
switch {
case cfg.Metrics.K8sContainerCPUNodeUtilization.Enabled:
return errors.New("for k8s.container.cpu.node.utilization node setting is required. Check the readme on how to set the required setting")
case cfg.Metrics.K8sPodCPUNodeUtilization.Enabled:
return errors.New("for k8s.pod.cpu.node.utilization node setting is required. Check the readme on how to set the required setting")
case cfg.Metrics.K8sContainerMemoryNodeUtilization.Enabled:
return errors.New("for k8s.container.memory.node.utilization node setting is required. Check the readme on how to set the required setting")
case cfg.Metrics.K8sPodMemoryNodeUtilization.Enabled:
return errors.New("for k8s.pod.memory.node.utilization node setting is required. Check the readme on how to set the required setting")
}
}
return nil
}
56 changes: 56 additions & 0 deletions receiver/kubeletstatsreceiver/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,62 @@ func TestLoadConfig(t *testing.T) {
},
expectedValidationErr: "for k8s.pod.cpu.node.utilization node setting is required. Check the readme on how to set the required setting",
},
{
id: component.NewIDWithName(metadata.Type, "container_memory_node_utilization"),
expected: &Config{
ControllerConfig: scraperhelper.ControllerConfig{
CollectionInterval: duration,
InitialDelay: time.Second,
},
ClientConfig: kube.ClientConfig{
APIConfig: k8sconfig.APIConfig{
AuthType: "tls",
},
},
MetricGroupsToCollect: []kubelet.MetricGroup{
kubelet.ContainerMetricGroup,
kubelet.PodMetricGroup,
kubelet.NodeMetricGroup,
},
MetricsBuilderConfig: metadata.MetricsBuilderConfig{
Metrics: metadata.MetricsConfig{
K8sContainerMemoryNodeUtilization: metadata.MetricConfig{
Enabled: true,
},
},
ResourceAttributes: metadata.DefaultResourceAttributesConfig(),
},
},
expectedValidationErr: "for k8s.container.memory.node.utilization node setting is required. Check the readme on how to set the required setting",
},
{
id: component.NewIDWithName(metadata.Type, "pod_memory_node_utilization"),
expected: &Config{
ControllerConfig: scraperhelper.ControllerConfig{
CollectionInterval: duration,
InitialDelay: time.Second,
},
ClientConfig: kube.ClientConfig{
APIConfig: k8sconfig.APIConfig{
AuthType: "tls",
},
},
MetricGroupsToCollect: []kubelet.MetricGroup{
kubelet.ContainerMetricGroup,
kubelet.PodMetricGroup,
kubelet.NodeMetricGroup,
},
MetricsBuilderConfig: metadata.MetricsBuilderConfig{
Metrics: metadata.MetricsConfig{
K8sPodMemoryNodeUtilization: metadata.MetricConfig{
Enabled: true,
},
},
ResourceAttributes: metadata.DefaultResourceAttributesConfig(),
},
},
expectedValidationErr: "for k8s.pod.memory.node.utilization node setting is required. Check the readme on how to set the required setting",
},
}

for _, tt := range tests {
Expand Down
16 changes: 16 additions & 0 deletions receiver/kubeletstatsreceiver/documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -426,6 +426,14 @@ Container cpu utilization as a ratio of the container's requests
| ---- | ----------- | ---------- |
| 1 | Gauge | Double |
### k8s.container.memory.node.utilization
Container memory utilization as a ratio of the node's capacity
| Unit | Metric Type | Value Type |
| ---- | ----------- | ---------- |
| 1 | Gauge | Double |
### k8s.container.memory_limit_utilization
Container memory utilization as a ratio of the container's limits
Expand Down Expand Up @@ -490,6 +498,14 @@ Pod cpu utilization as a ratio of the pod's total container requests. If any con
| ---- | ----------- | ---------- |
| 1 | Gauge | Double |
### k8s.pod.memory.node.utilization
Pod memory utilization as a ratio of the node's capacity
| Unit | Metric Type | Value Type |
| ---- | ----------- | ---------- |
| 1 | Gauge | Double |
### k8s.pod.memory_limit_utilization
Pod memory utilization as a ratio of the pod's total container limits. If any container is missing a limit the metric is not emitted.
Expand Down
10 changes: 5 additions & 5 deletions receiver/kubeletstatsreceiver/internal/kubelet/accumulator.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ func (a *metricDataAccumulator) nodeStats(s stats.NodeStats) {
currentTime := pcommon.NewTimestampFromTime(a.time)
addUptimeMetric(a.mbs.NodeMetricsBuilder, metadata.NodeUptimeMetrics.Uptime, s.StartTime, currentTime)
addCPUMetrics(a.mbs.NodeMetricsBuilder, metadata.NodeCPUMetrics, s.CPU, currentTime, resources{}, 0)
addMemoryMetrics(a.mbs.NodeMetricsBuilder, metadata.NodeMemoryMetrics, s.Memory, currentTime, resources{})
addMemoryMetrics(a.mbs.NodeMetricsBuilder, metadata.NodeMemoryMetrics, s.Memory, currentTime, resources{}, 0)
addFilesystemMetrics(a.mbs.NodeMetricsBuilder, metadata.NodeFilesystemMetrics, s.Fs, currentTime)
addNetworkMetrics(a.mbs.NodeMetricsBuilder, metadata.NodeNetworkMetrics, s.Network, currentTime)
// todo s.Runtime.ImageFs
Expand All @@ -76,8 +76,8 @@ func (a *metricDataAccumulator) podStats(s stats.PodStats) {

currentTime := pcommon.NewTimestampFromTime(a.time)
addUptimeMetric(a.mbs.PodMetricsBuilder, metadata.PodUptimeMetrics.Uptime, s.StartTime, currentTime)
addCPUMetrics(a.mbs.PodMetricsBuilder, metadata.PodCPUMetrics, s.CPU, currentTime, a.metadata.podResources[s.PodRef.UID], a.metadata.cpuNodeLimit)
addMemoryMetrics(a.mbs.PodMetricsBuilder, metadata.PodMemoryMetrics, s.Memory, currentTime, a.metadata.podResources[s.PodRef.UID])
addCPUMetrics(a.mbs.PodMetricsBuilder, metadata.PodCPUMetrics, s.CPU, currentTime, a.metadata.podResources[s.PodRef.UID], a.metadata.nodeCapacity.CPUCapacity)
addMemoryMetrics(a.mbs.PodMetricsBuilder, metadata.PodMemoryMetrics, s.Memory, currentTime, a.metadata.podResources[s.PodRef.UID], a.metadata.nodeCapacity.MemoryCapacity)
addFilesystemMetrics(a.mbs.PodMetricsBuilder, metadata.PodFilesystemMetrics, s.EphemeralStorage, currentTime)
addNetworkMetrics(a.mbs.PodMetricsBuilder, metadata.PodNetworkMetrics, s.Network, currentTime)

Expand Down Expand Up @@ -110,8 +110,8 @@ func (a *metricDataAccumulator) containerStats(sPod stats.PodStats, s stats.Cont
currentTime := pcommon.NewTimestampFromTime(a.time)
resourceKey := sPod.PodRef.UID + s.Name
addUptimeMetric(a.mbs.ContainerMetricsBuilder, metadata.ContainerUptimeMetrics.Uptime, s.StartTime, currentTime)
addCPUMetrics(a.mbs.ContainerMetricsBuilder, metadata.ContainerCPUMetrics, s.CPU, currentTime, a.metadata.containerResources[resourceKey], a.metadata.cpuNodeLimit)
addMemoryMetrics(a.mbs.ContainerMetricsBuilder, metadata.ContainerMemoryMetrics, s.Memory, currentTime, a.metadata.containerResources[resourceKey])
addCPUMetrics(a.mbs.ContainerMetricsBuilder, metadata.ContainerCPUMetrics, s.CPU, currentTime, a.metadata.containerResources[resourceKey], a.metadata.nodeCapacity.CPUCapacity)
addMemoryMetrics(a.mbs.ContainerMetricsBuilder, metadata.ContainerMemoryMetrics, s.Memory, currentTime, a.metadata.containerResources[resourceKey], a.metadata.nodeCapacity.MemoryCapacity)
addFilesystemMetrics(a.mbs.ContainerMetricsBuilder, metadata.ContainerFilesystemMetrics, s.Rootfs, currentTime)

a.m = append(a.m, a.mbs.ContainerMetricsBuilder.Emit(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ func TestMetadataErrorCases(t *testing.T) {
},
},
},
}, NodeLimits{}, nil),
}, NodeCapacity{}, nil),
testScenario: func(acc metricDataAccumulator) {
now := metav1.Now()
podStats := stats.PodStats{
Expand All @@ -79,7 +79,7 @@ func TestMetadataErrorCases(t *testing.T) {
metricGroupsToCollect: map[MetricGroup]bool{
VolumeMetricGroup: true,
},
metadata: NewMetadata([]MetadataLabel{MetadataLabelVolumeType}, nil, NodeLimits{}, nil),
metadata: NewMetadata([]MetadataLabel{MetadataLabelVolumeType}, nil, NodeCapacity{}, nil),
testScenario: func(acc metricDataAccumulator) {
podStats := stats.PodStats{
PodRef: stats.PodReference{
Expand Down Expand Up @@ -121,7 +121,7 @@ func TestMetadataErrorCases(t *testing.T) {
},
},
},
}, NodeLimits{}, nil),
}, NodeCapacity{}, nil),
testScenario: func(acc metricDataAccumulator) {
podStats := stats.PodStats{
PodRef: stats.PodReference{
Expand Down Expand Up @@ -165,7 +165,7 @@ func TestMetadataErrorCases(t *testing.T) {
},
},
},
}, NodeLimits{}, nil),
}, NodeCapacity{}, nil),
detailedPVCLabelsSetterOverride: func(*metadata.ResourceBuilder, string, string, string) error {
// Mock failure cases.
return errors.New("")
Expand Down
11 changes: 10 additions & 1 deletion receiver/kubeletstatsreceiver/internal/kubelet/mem.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,13 @@ import (
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver/internal/metadata"
)

func addMemoryMetrics(mb *metadata.MetricsBuilder, memoryMetrics metadata.MemoryMetrics, s *stats.MemoryStats, currentTime pcommon.Timestamp, r resources) {
func addMemoryMetrics(
mb *metadata.MetricsBuilder,
memoryMetrics metadata.MemoryMetrics,
s *stats.MemoryStats,
currentTime pcommon.Timestamp,
r resources,
nodeMemoryLimit float64) {
if s == nil {
return
}
Expand All @@ -29,5 +35,8 @@ func addMemoryMetrics(mb *metadata.MetricsBuilder, memoryMetrics metadata.Memory
if r.memoryRequest > 0 {
memoryMetrics.RequestUtilization(mb, currentTime, float64(*s.UsageBytes)/float64(r.memoryRequest))
}
if nodeMemoryLimit > 0 {
memoryMetrics.NodeUtilization(mb, currentTime, float64(*s.UsageBytes)/nodeMemoryLimit)
}
}
}
15 changes: 9 additions & 6 deletions receiver/kubeletstatsreceiver/internal/kubelet/metadata.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ type Metadata struct {
DetailedPVCResourceSetter func(rb *metadata.ResourceBuilder, volCacheID, volumeClaim, namespace string) error
podResources map[string]resources
containerResources map[string]resources
cpuNodeLimit float64
nodeCapacity NodeCapacity
}

type resources struct {
Expand All @@ -62,9 +62,12 @@ type resources struct {
memoryLimit int64
}

type NodeLimits struct {
Name string
CPUNanoCoresLimit float64
type NodeCapacity struct {
Name string
// node's CPU capacity in cores
CPUCapacity float64
// node's Memory capacity in bytes
MemoryCapacity float64
}

func getContainerResources(r *v1.ResourceRequirements) resources {
Expand All @@ -80,15 +83,15 @@ func getContainerResources(r *v1.ResourceRequirements) resources {
}
}

func NewMetadata(labels []MetadataLabel, podsMetadata *v1.PodList, nodeResourceLimits NodeLimits,
func NewMetadata(labels []MetadataLabel, podsMetadata *v1.PodList, nodeCap NodeCapacity,
detailedPVCResourceSetter func(rb *metadata.ResourceBuilder, volCacheID, volumeClaim, namespace string) error) Metadata {
m := Metadata{
Labels: getLabelsMap(labels),
PodsMetadata: podsMetadata,
DetailedPVCResourceSetter: detailedPVCResourceSetter,
podResources: make(map[string]resources),
containerResources: make(map[string]resources),
cpuNodeLimit: nodeResourceLimits.CPUNanoCoresLimit,
nodeCapacity: nodeCap,
}

if podsMetadata != nil {
Expand Down
Loading

0 comments on commit e42da8b

Please sign in to comment.