Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu/memory manager containerMap memory leak #109103

Merged
merged 1 commit into from
May 4, 2022

Conversation

Dingshujie
Copy link
Member

What type of PR is this?

/kind bug

What this PR does / why we need it:

if cpu manager/ memory manager policy set to none, no one remove container info from containerMap, finaly lead memory leak.

this pr call cpu manager/memory manager RemoveContainer in PostStopContainer, to remove container info from containerMap

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 29, 2022
@Dingshujie
Copy link
Member Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. area/kubelet and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 29, 2022
@Dingshujie
Copy link
Member Author

@derekwaynecarr
Copy link
Member

/assign

@klueska as well.

@klueska
Copy link
Contributor

klueska commented Mar 29, 2022

We cannot do what you propose here. We used to do what you are doing, but it causes issues when containers are restarted (within a still running pod). We need to maintain the containerMap across these restarts.

We may indeed have a memory leak for the case of the none policy, but we will need to solve it another way.

@Dingshujie
Copy link
Member Author

We cannot do what you propose here. We used to do what you are doing, but it causes issues when containers are restarted (within a still running pod). We need to maintain the containerMap across these restarts.

We may indeed have a memory leak for the case of the none policy, but we will need to solve it another way.

got restart scene, i will try another method to solve it

@Dingshujie
Copy link
Member Author

@klueska find another method to slove this leak, Can you take a look at this PR again? thanks a lot.

@ehashman
Copy link
Member

/release-note-none
/priority important-soon
/triage accepted

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 29, 2022
@Dingshujie
Copy link
Member Author

Can we add this to v1.24 milestone? @rphillips @liggitt

@liggitt
Copy link
Member

liggitt commented Mar 31, 2022

It appears we have to wait for the tree to thaw for 1.25 for this fix

Bugfixes can be merged during code freeze if impact/risk is justified.

Is this a regression? (looks like not, since there are cherry-picks open)

What is the risk of the fix? Is it changing feature-gated or optionally executed code? If so, what feature gate or opt-in configuration exposes the bug?

@liggitt
Copy link
Member

liggitt commented Mar 31, 2022

I also wouldn't expect us to take a change without accompanying tests

@Dingshujie Dingshujie closed this Mar 31, 2022
@Dingshujie Dingshujie reopened this Mar 31, 2022
@Dingshujie
Copy link
Member Author

Dingshujie commented Mar 31, 2022

sorry, careless close pr, reopen now

@Dingshujie
Copy link
Member Author

It appears we have to wait for the tree to thaw for 1.25 for this fix

Bugfixes can be merged during code freeze if impact/risk is justified.

Is this a regression? (looks like not, since there are cherry-picks open)

What is the risk of the fix? Is it changing feature-gated or optionally executed code? If so, what feature gate or opt-in configuration exposes the bug?

cpu manager default policy is none, if pod repeated create and delete, kubelet memory will keep growing.
also cpu manager policy set to static, none exclusive pod container id also will leak in containermaps, kubelet memory will keep growing.

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@rphillips
Copy link
Member

@Dingshujie Since the project is in a frozen state, the criteria is does the PR fix a regression within the 1.24 release. Waiting for the thaw minimizes churn/bugs/regressions added during the release process.

@Dingshujie
Copy link
Member Author

@Dingshujie Since the project is in a frozen state, the criteria is does the PR fix a regression within the 1.24 release. Waiting for the thaw minimizes churn/bugs/regressions added during the release process.

ok, let's wait for 1.24 released.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Dingshujie, klueska, yanghesong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pacoxu
Copy link
Member

pacoxu commented Apr 7, 2022

This will be included in 1.25. So a cherry-pick PR for 1.24 will be needed as well. (It is ok to open one after v1.24 is released. https://github.com/kubernetes/kubernetes/tree/release-1.24 is not ready yet.)

@Dingshujie
Copy link
Member Author

This will be included in 1.25. So a cherry-pick PR for 1.24 will be needed as well. (It is ok to open one after v1.24 is released. https://github.com/kubernetes/kubernetes/tree/release-1.24 is not ready yet.)

i will propose a pr for release-1.24

k8s-ci-robot added a commit that referenced this pull request Apr 28, 2022
…09103-upstream-release-1.21

Automated cherry pick of #109103: cpu manager policy set to none, no one remove container id
@k8s-ci-robot k8s-ci-robot merged commit dbf2f1d into kubernetes:master May 4, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone May 4, 2022
k8s-ci-robot added a commit that referenced this pull request May 5, 2022
…09103-upstream-release-1.22

Automated cherry pick of #109103: cpu manager policy set to none, no one remove container id
k8s-ci-robot added a commit that referenced this pull request May 31, 2022
…09103-upstream-release-1.24

Automated cherry pick of #109103: cpu manager policy set to none, no one remove container id
k8s-ci-robot added a commit that referenced this pull request May 31, 2022
…09103-upstream-release-1.23

Automated cherry pick of #109103: cpu manager policy set to none, no one remove container id
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

10 participants