Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visually Document Container Memory Metrics and their Relationships #25388

Open
stevekuznetsov opened this issue Dec 3, 2020 · 22 comments
Open
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. language/en Issues or PRs related to English language lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@stevekuznetsov
Copy link
Contributor

stevekuznetsov commented Dec 3, 2020

This is a Feature Request

What would you like to be added
A document that explains what all the different container memory metrics mean and how they are interrelated.

Why is this needed
Today, the following metrics exist for container memory:

  • container_memory_cache
  • container_memory_mapped_file
  • container_memory_max_usage_bytes
  • container_memory_rss
  • container_memory_swap
  • container_memory_usage_bytes
  • container_memory_working_set_bytes

I would like to see a document that explains what they are, how they are different or similar to each other, how they nest, what container="" and container="POD" mean, which metric(s) are used by the kubelet to evict, why usage_bytes and max_usage_bytes might differ, the effects of quantized sampling, etc.

Comments
A visual description would be amazing here, as there are hierarchical relationships that would benefit from such a view.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 3, 2020
@stevekuznetsov
Copy link
Contributor Author

/sig instrumentation

@k8s-ci-robot k8s-ci-robot added the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Dec 3, 2020
@stevekuznetsov
Copy link
Contributor Author

/cc @ehashman

@sftim
Copy link
Contributor

sftim commented Dec 4, 2020

This would be a great addition to the reference docs - especially with a visual.

/triage accepted
/priority backlog
/kind feature
/language en

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. kind/feature Categorizes issue or PR as related to a new feature. language/en Issues or PRs related to English language and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 4, 2020
@ehashman
Copy link
Member

ehashman commented Dec 4, 2020

/help

@k8s-ci-robot
Copy link
Contributor

@ehashman:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Dec 4, 2020
@sftim
Copy link
Contributor

sftim commented Dec 15, 2020

@brennerm as you've been doing a great job drafting diagrams, I thought you might like to know about this feature request too

@ehashman
Copy link
Member

/cc @bobbypage

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 16, 2021
@ehashman
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 16, 2021
@stevekuznetsov
Copy link
Contributor Author

xref google/cadvisor#2138

The above issue from cAdvisor seems to document how all of this goes on ...

@stevekuznetsov
Copy link
Contributor Author

OK, so from what I can tell, the following things are true:

container_memory_working_set_bytes = container_memory_usage_bytes - <inactive_memory>

We can see this calculation here, and <inactive_memory> is a kernel concept

Furthermore:

container_memory_usage_bytes == container_memory_rss + container_memory_cache + container_memory_swap + <kernel memory>

Where <kernel memory> is memory allocated within the kernel, not yet exposed from cAdvisor (and therefore not exposed to Prometheus) as of google/cadvisor#2138

@stevekuznetsov
Copy link
Contributor Author

The kernel and kubelet will use the container_memory_working_set_bytes for OOMKills.

@stevekuznetsov
Copy link
Contributor Author

@ehashman @bobbypage @sftim @derekwaynecarr it doesn't look like the subject matter experts on this are too keen on documenting this, so I'll try to do it, I guess. Who will review my work, and if they understand this well could they perhaps jot down more thoughts in response to what I've written?

@stevekuznetsov
Copy link
Contributor Author

Perhaps @mrunalp or @rphillips know?

@ehashman
Copy link
Member

@stevekuznetsov I'll make sure we get a reviewer, I might pull in @dashpole and I'll take a look as well.

@ehashman
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jun 24, 2021
@derekwaynecarr
Copy link
Member

@stevekuznetsov i am happy to help review.

@sftim
Copy link
Contributor

sftim commented Apr 11, 2022

I wonder if we could sketch out (and for now only sketch out) what we want, saving the detailed work for a docs sprint at the next KubeCon.

@stevekuznetsov
Copy link
Contributor Author

Totally! I think the questions I always ended up coming to were:

  • what is the relationship between all of the metrics I am able to see?
  • which metric, specifically, is being used by e.g. the scheduler/descheduler, the kubelet (for evictions) and the kernel (for hard-OOM)?

@jai
Copy link
Contributor

jai commented Dec 17, 2022

/assign

@vaibhav2107
Copy link
Member

@jai Are you still working on this issue?

@vaibhav2107
Copy link
Member

Unassigning @jai as didn't receive any updates. Please assign if you get back to this or anyone can also assign to this issue
/unassign @jai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. language/en Issues or PRs related to English language lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

8 participants