Skip to content

Commit f4de431

Browse files
authored
Merge pull request #2812 from haircommander/stats-2
KEP-2371: update kep to reflect current state of enhancement
2 parents a7c988c + 024717e commit f4de431

File tree

2 files changed

+83
-135
lines changed

2 files changed

+83
-135
lines changed

keps/sig-node/2371-cri-pod-container-stats/README.md

Lines changed: 78 additions & 129 deletions
Original file line numberDiff line numberDiff line change
@@ -183,9 +183,9 @@ We want to avoid using cAdvisor for container & pod level stats and move metric
183183
* cAdvisor and metric dependency: CRI mission is not fully fulfilled - container runtime is not fully plugable.
184184
* Break the monolithic design of cAdvisor, which needs to be aware of the underlying container runtime.
185185
* Duplicate stats are collected by both cAdvisor and the CRI runtime, which can lead to:
186-
* Different information from different sources
187-
* Confusion from unclear origin of a given metric
188-
* Performance degradations (increased CPU / Memory / etc) [xref][perf-issue]
186+
* Different information from different sources
187+
* Confusion from unclear origin of a given metric
188+
* Performance degradations (increased CPU / Memory / etc) [xref][perf-issue]
189189
* Stats should be reported by the container runtime which knows behavior of the container/pod the best.
190190
* cAdvisor only supports runtimes that run processes on the host, not e.g. VM based runtime like Kata Containers.
191191
* cAdvisor only supports linux containers, not Windows ones.
@@ -304,7 +304,8 @@ These correspond to some fields of the [ContainerStats](#summary-container-stats
304304
+ // Corresponds to Stats Summary API CPUStats UsageCoreNanoSeconds
305305
= UInt64Value usage_core_nano_seconds = 2;
306306
+ // Total CPU usage (sum of all cores) averaged over the sample window.
307-
+ UInt64Value usage_nano_seconds = 3;
307+
+ // The "core" unit can be interpreted as CPU core-nanoseconds per second.
308+
+ UInt64Value usage_nano_cores = 3;
308309
=}
309310
310311
=// MemoryUsage provides the memory usage information.
@@ -315,74 +316,19 @@ These correspond to some fields of the [ContainerStats](#summary-container-stats
315316
= // The amount of working set memory in bytes.
316317
+ // Corresponds to Stats Summary API MemoryStats WorkingSetBytes field
317318
= UInt64Value working_set_bytes = 2;
318-
+ // Available memory for use. This is defined as the memory limit = workingSetBytes.
319-
+ // If memory limit is undefined, the available bytes is omitted.
319+
+ // Available memory for use. This is defined as the memory limit - workingSetBytes.
320320
+ UInt64Value available_bytes = 3;
321321
+ // Total memory in use. This includes all memory regardless of when it was accessed.
322-
+ UInt64Value usage_bytes
323-
+ // The amount of working set memory. This includes recently accessed memory,
324-
+ // dirty memory, and kernel memory. WorkingSetBytes is <= UsageBytes
325-
+ UInt64Value working_set_bytes = 4;
326-
+ // The amount of anonymous and swap cache memory (includes transparent
327-
+ // hugepages).
322+
+ UInt64Value usage_bytes = 4;
323+
+ // The amount of anonymous and swap cache memory (includes transparent hugepages).
328324
+ UInt64Value rss_bytes = 5;
329325
+ // Cumulative number of minor page faults.
330-
+ Uint64Value page_faults = 6;
326+
+ UInt64Value page_faults = 6;
331327
+ // Cumulative number of major page faults.
332-
+ Uint64Value major_page_faults = 6;
333-
=}
334-
335-
=// FilesystemUsage provides the filesystem usage information.
336-
=message FilesystemUsage {
337-
= // Timestamp in nanoseconds at which the information were collected. Must be > 0.
338-
+ // Corresponds to Stats Summary API FsStats Time field
339-
= int64 timestamp = 1;
340-
= // The unique identifier of the filesystem.
341-
+ // Does not correspond to any field in Stats Summary API FsStats
342-
= FilesystemIdentifier fs_id = 2;
343-
= // UsedBytes represents the bytes used for images on the filesystem.
344-
= // This may differ from the total bytes used on the filesystem and may not
345-
= // equal CapacityBytes - AvailableBytes.
346-
= // Corresponds to Stats Summary API FsStats UsedBytes field
347-
= UInt64Value used_bytes = 3;
348-
= // InodesUsed represents the inodes used by the images.
349-
= // This may not equal InodesCapacity - InodesAvailable because the underlying
350-
= // filesystem may also be used for purposes other than storing images.
351-
= // Corresponds to Stats Summary API FsStats InodesUsed field
352-
= UInt64Value inodes_used = 4;
353-
+ // TODO: Unclear how the remaining fields relate to container stats. Is it filled in cAdvisor?
354-
+ // AvailableBytes represents the storage space available (bytes) for the filesystem.
355-
+ UInt64Value available_bytes = 5;
356-
+ // CapacityBytes represents the total capacity (bytes) of the filesystems underlying storage.
357-
+ UInt64Value capacity_bytes = 6;
358-
+ // InodesFree represents the free inodes in the filesystem.
359-
+ UInt64Value inodes_free = 7;
360-
+ // Inodes represents the total inodes in the filesystem.
361-
+ UInt64Value inodes = 8;
328+
+ UInt64Value major_page_faults = 7;
362329
=}
363330
```
364331

365-
All together, below is the proposition for the new `ContainerStats` object:
366-
```
367-
=// ContainerStats provides the resource usage statistics for a container.
368-
=message ContainerStats {
369-
= // Information of the container.
370-
+ // Corresponds to the Stats Summary API ContainerStats Name field
371-
= ContainerAttributes attributes = 1;
372-
= // CPU usage gathered from the container.
373-
+ // Corresponds to Stats Summary API ContainerStats CPUStats field
374-
= CpuUsage cpu = 2;
375-
= // Memory usage gathered from the container.
376-
+ // Corresponds to Stats Summary API ContainerStats MemoryStats field
377-
= MemoryUsage memory = 3;
378-
= // Usage of the writable layer.
379-
+ // Corresponds to Stats Summary API ContainerStats Rootfs field
380-
= FilesystemUsage writable_layer = 4;
381-
+ // Stats pertaining to container logs usage of filesystem resources
382-
+ // Logs.UsedBytes is the number of bytes used for the container logs.
383-
+ FilesystemUsage logs = 5;
384-
=}
385-
```
386332
Notes:
387333
- In Stats Summary API ContainerStats object, there's a timestamp field. We do not need such a field, as each struct in the ContainerStats object
388334
has its own timestamp, allowing CRI implementations flexibility when they collect which metrics.
@@ -402,49 +348,45 @@ They will be defined as follows:
402348
// Runtime service defines the public APIs for remote pod runtimes
403349
service RuntimeService {
404350
...
405-
// PodSandboxStats returns stats of the pod. If the pod does not
351+
// PodSandboxStats returns stats of the pod. If the pod sandbox does not
406352
// exist, the call returns an error.
407353
rpc PodSandboxStats(PodSandboxStatsRequest) returns (PodSandboxStatsResponse) {}
408-
// ListPodSandboxStats returns stats of all running pods.
354+
// ListPodSandboxStats returns stats of the pods matching a filter.
409355
rpc ListPodSandboxStats(ListPodSandboxStatsRequest) returns (ListPodSandboxStatsResponse) {}
410356
...
411357
}
412358
...
413-
414-
message PodSandboxStatsRequest{
415-
// ID of the pod for which to retrieve stats.
416-
string pod_id = 1;
359+
message PodSandboxStatsRequest {
360+
// ID of the pod sandbox for which to retrieve stats.
361+
string pod_sandbox_id = 1;
417362
}
418363
419364
message PodSandboxStatsResponse {
420-
// Stats of the pod.
421365
PodSandboxStats stats = 1;
422366
}
423367
424-
425-
// PodSandboxStatsFilter is used to filter containers.
426-
// All those fields are combined with 'AND'
368+
// PodSandboxStatsFilter is used to filter the list of pod sandboxes to retrieve stats for.
369+
// All those fields are combined with 'AND'.
427370
message PodSandboxStatsFilter {
428-
// ID of the container.
371+
// ID of the pod sandbox.
429372
string id = 1;
430-
// LabelSelector to select matches.
373+
// LabelSelector to select matches.
431374
// Only api.MatchLabels is supported for now and the requirements
432375
// are ANDed. MatchExpressions is not supported yet.
433376
map<string, string> label_selector = 2;
434377
}
435378
436-
message ListPodSandboxStatsRequest{
379+
message ListPodSandboxStatsRequest {
437380
// Filter for the list request.
438381
PodSandboxStatsFilter filter = 1;
439382
}
440383
441384
message ListPodSandboxStatsResponse {
442-
// Stats of the pod.
385+
// Stats of the pod sandbox.
443386
repeated PodSandboxStats stats = 1;
444387
}
445388
446-
447-
// PodSandboxAttributes provides basic information of the container.
389+
// PodSandboxAttributes provides basic information of the pod sandbox.
448390
message PodSandboxAttributes {
449391
// ID of the pod.
450392
string id = 1;
@@ -460,58 +402,65 @@ message PodSandboxAttributes {
460402
}
461403
462404
// PodSandboxStats provides the resource usage statistics for a pod.
405+
// The linux or windows field will be populated depending on the platform.
463406
message PodSandboxStats {
464407
// Information of the pod.
465-
// Corresponds to PodRef in SummaryAPI
466408
PodSandboxAttributes attributes = 1;
467-
// CPU usage gathered from the pod.
468-
// Corresponds to Stats SummaryAPI CPUStats field
469-
CpuUsage cpu = 2;
470-
// Memory usage gathered from the pod.
471-
// Corresponds to Stats SummaryAPI MemoryStats field
472-
MemoryUsage memory = 3;
473-
// TODO: do we want a start time field?
474-
// The time at which data collection for the pod-scoped (e.g. network) stats was (re)started.
475-
// int64 timestamp = 1;
476-
// Stats of containers in the measured pod.
477-
repeated ContainerStats containers
478-
// Stats pertaining to CPU resources consumed by pod cgroup (which includes all containers' resource usage and pod overhead).
479-
NetworkStats network = 4;
480-
// Note the specific omission of VolumeStats and EphemeralStorage
481-
// Each of these fields will be calculated Kubelet-level
482-
// ProcessStats pertaining to processes.
483-
ProcessStats process = 5;
409+
// Stats from linux.
410+
LinuxPodSandboxStats linux = 2;
411+
// Stats from windows.
412+
WindowsPodSandboxStats windows = 3;
484413
}
485414
486-
// NetworkStats contains data about network resources.
487-
message NetworkStats {
488-
// The time at which these stats were updated.
489-
int64 timestamp = 1;
415+
// LinuxPodSandboxStats provides the resource usage statistics for a pod sandbox on linux.
416+
message LinuxPodSandboxStats {
417+
// CPU usage gathered for the pod sandbox.
418+
CpuUsage cpu = 1;
419+
// Memory usage gathered for the pod sandbox.
420+
MemoryUsage memory = 2;
421+
// Network usage gathered for the pod sandbox
422+
NetworkUsage network = 3;
423+
// Stats pertaining to processes in the pod sandbox.
424+
ProcessUsage process = 4;
425+
// Stats of containers in the measured pod sandbox.
426+
repeated ContainerStats containers = 5;
427+
}
490428
491-
// Stats for the default interface, if found
492-
InterfaceStats default_interface = 2;
429+
// WindowsPodSandboxStats provides the resource usage statistics for a pod sandbox on windows
430+
message WindowsPodSandboxStats {
431+
// TODO: Add stats relevant to windows.
432+
}
493433
494-
repeated InterfaceStats interfaces = 3;
434+
// NetworkUsage contains data about network resources.
435+
message NetworkUsage {
436+
// The time at which these stats were updated.
437+
int64 timestamp = 1;
438+
// Stats for the default network interface.
439+
NetworkInterfaceUsage default_interface = 2;
440+
// Stats for all found network interfaces, excluding the default.
441+
repeated NetworkInterfaceUsage interfaces = 3;
495442
}
496443
497-
// InterfaceStats contains resource value data about interface.
498-
type InterfaceStats struct {
499-
// The name of the interface
444+
// NetworkInterfaceUsage contains resource value data about a network interface.
445+
message NetworkInterfaceUsage {
446+
// The name of the network interface.
500447
string name = 1;
501448
// Cumulative count of bytes received.
502-
Uint64Value rx_bytes = 2;
449+
UInt64Value rx_bytes = 2;
503450
// Cumulative count of receive errors encountered.
504-
Uint64Value rx_errors = 2;
451+
UInt64Value rx_errors = 3;
505452
// Cumulative count of bytes transmitted.
506-
Uint64Value tx_bytes = 2;
453+
UInt64Value tx_bytes = 4;
507454
// Cumulative count of transmit errors encountered.
508-
Uint64Value tx_errors = 2;
455+
UInt64Value tx_errors = 5;
509456
}
510457
511-
// ProcessStats are stats pertaining to processes.
512-
message ProcessStats {
513-
// Number of processes in the pod.
514-
Uint64Value process_count = 1;
458+
// ProcessUsage are stats pertaining to processes.
459+
message ProcessUsage {
460+
// The time at which these stats were updated.
461+
int64 timestamp = 1;
462+
// Number of processes.
463+
UInt64Value process_count = 2;
515464
}
516465
```
517466

@@ -561,18 +510,18 @@ The table above describes the various metrics that are in this endpoint.
561510
Each compliant CRI implementation must:
562511
- Have a location broadcasted about where these metrics can be gathered from. The endpoint name must not necessarily be `/metrics/cadvisor`, nor be gathererd from the same port as it was from cAdvisor
563512
- Implement *all* metrics within the set of metrics that are decided on.
564-
- **TODO** How will we decide this set? We could support all, or take polls from the community and come up with a set of sufficiently useful metrics.
513+
- **TODO** How will we decide this set? We could support all, or take polls from the community and come up with a set of sufficiently useful metrics.
565514
- Pass a set of tests in the critest suite that verify they report the correct values for *all* supported metrics labels (to ensure continued conformance and standardization).
566515

567516
Below is the proposed strategy for doing so:
568517

569518
1. The Alpha release will strictly cover research, performance testing and the creation of conformance tests.
570-
- Initial research on the set of metrics required should be done. This will, possibly, allow the community to declare metrics that are not required to be moved to the CRI implementations.
571-
- Testing on how performant cAdvisor+Kubelet are today should be done, to find a target, acceptable threshold of performance for the CRI implementations
572-
- Creation of tests verifying the metrics are reported correctly should be created and verified with the existing cAdvisor implementation.
519+
- Initial research on the set of metrics required should be done. This will, possibly, allow the community to declare metrics that are not required to be moved to the CRI implementations.
520+
- Testing on how performant cAdvisor+Kubelet are today should be done, to find a target, acceptable threshold of performance for the CRI implementations
521+
- Creation of tests verifying the metrics are reported correctly should be created and verified with the existing cAdvisor implementation.
573522
2. For the Beta release, add initial support for CRI implementations to report these metrics
574-
- This set of metrics will be based on the research done in alpha
575-
- Each will be validated against the conformance and performance tests created in alpha.
523+
- This set of metrics will be based on the research done in alpha
524+
- Each will be validated against the conformance and performance tests created in alpha.
576525
3. For the GA release, the CRI implementation should be the source of truth for all pod and container level metrics that external parties rely on (no matter how many endpoints the Kubelet advertises).
577526

578527
#### cAdvisor
@@ -618,7 +567,7 @@ As a requirement for the Beta stage, cAdvisor must support optionally collecting
618567
### Version Skew Strategy
619568

620569
- Breaking changes between versions will be mitigated by the FeatureGate.
621-
- By the time the FeatureGate is deprecated, it is expected the transition between CRI and cAdvisor is complete, and CRI has had at least one release to expose the required metrics (to allow for `n-1` CRI skew).
570+
- By the time the FeatureGate is deprecated, it is expected the transition between CRI and cAdvisor is complete, and CRI has had at least one release to expose the required metrics (to allow for `n-1` CRI skew).
622571
- In general, CRI should be updated in tandem with or before the Kubelet.
623572

624573
## Production Readiness Review Questionnaire
@@ -775,13 +724,13 @@ operations covered by [existing SLIs/SLOs]?**
775724
Think about adding additional work or introducing new steps in between
776725
(e.g. need to do X to start a container), etc. Please describe the details.
777726
- The process of collecting and reporting the metrics should not differ too much between cAdvisor and the CRI implementation:
778-
- At a high level, both need to watch the changes to the stats (from cgroups, disk and network stats)
779-
- Once collected, the CRI implementation will need to report them (both through the CRI and eventually through the prometheus endpoint).
780-
- Both of these steps are already done by cAdvisor, so the work is changing hands, but not fundamentally changing.
727+
- At a high level, both need to watch the changes to the stats (from cgroups, disk and network stats)
728+
- Once collected, the CRI implementation will need to report them (both through the CRI and eventually through the prometheus endpoint).
729+
- Both of these steps are already done by cAdvisor, so the work is changing hands, but not fundamentally changing.
781730
- It is possible the Alpha iteration of this KEP may affect CPU/memory usage on the node:
782731
- This may come because cAdvisor's performance has been fine-tuned, and changing the location of work may loose some optimizations.
783-
- However, it is explicitly stated that a requirement for transition from Alpha->Beta is little to no performance degradation.
784-
- The existence of the feature gate will allow users to mitigate this potential blip in performance (by not opting-in).
732+
- However, it is explicitly stated that a requirement for transition from Alpha->Beta is little to no performance degradation.
733+
- The existence of the feature gate will allow users to mitigate this potential blip in performance (by not opting-in).
785734
* **Will enabling / using this feature result in non-negligible increase of
786735
resource usage (CPU, RAM, disk, IO, ...) in any components?**
787736
- It most likely will reduce resource utilization. Right now, there is duplicate work being done between CRI and cAdvisor.
@@ -818,6 +767,6 @@ Note: This is by design as this will enable to decouple runtime implementation d
818767
## Alternatives
819768

820769
- Instead of teaching CRI how to do *everything* cAdvisor does, we could instead have cAdvisor not do the work the CRI stats end up doing (specifically when reporting disk stats, which are the most expensive operation to report).
821-
- However, this doesn't address the anti-pattern of having multiple parties confusingly responsible for a wide array of metrics and other issues described.
770+
- However, this doesn't address the anti-pattern of having multiple parties confusingly responsible for a wide array of metrics and other issues described.
822771
- Have cAdvisor implement the summary API. A cAdvisor daemonset could be a drop-in replacement for the summary API.
823772
- Don't keep supporting the summary API. Replace it with a "better" format, like prometheus. Or help users migrate to equivalent APIs that container runtimes already expose for monitoring.

keps/sig-node/2371-cri-pod-container-stats/kep.yaml

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,20 +10,19 @@ reviewers:
1010
- "@mrunalp"
1111
- "@ehashman"
1212
- "@marosset"
13-
- "TODO containerd reviewer"
13+
- "@mikebrow"
1414
approvers:
1515
- "@dchen1107"
1616
prr-approvers:
1717
- "@ehashman"
18-
editor: TBD
1918
creation-date: 2021-01-27
20-
last-updated: 2021-05-12
19+
last-updated: 2021-09-08
2120
status: implementable
2221
stage: alpha
23-
latest-milestone: "v1.22"
22+
latest-milestone: "v1.23"
2423
see-also:
25-
- TODO
24+
- N/A
2625
replaces:
27-
- TODO/N/A?
26+
- N/A
2827
superseded-by:
2928
- N/A

0 commit comments

Comments
 (0)