-
Notifications
You must be signed in to change notification settings - Fork 39.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add network stats for Windows nodes and containers #74788
Add network stats for Windows nodes and containers #74788
Conversation
/milestone v1.14 |
looks good overall, even though there is a lot of duplication of code that would have been better in an abstraction. i noted a small bug in there. have you tested this and how does performance relate to the previous code? |
575609f
to
fd5a7a5
Compare
Fixed the bug now. PTAL.
Yep, kubelet consumes much less CPU now (kubelet 1-4% from my cluster). |
/retest |
// listContainerNetworkStats returns the network stats of all the running containers. | ||
func (p *criStatsProvider) listContainerNetworkStats() (map[string]*statsapi.NetworkStats, error) { | ||
// Always return nil for unsupported platforms. | ||
return nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to yield a nil pointer exception on unsupported platforms on cri_stats_provider.go:192. Not sure if it matters, but maybe this could return an error like https://github.com/kubernetes/kubernetes/blob/48def74e641f8dff50fcf7c33c2a412593e2fede/pkg/util/mount/exec_mount_unsupported.go ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In such case, we still want to continue other stats logic in callers. So nil is returned here instead of errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the code is written right now, it's hard to tell whether nil
is expected or not. @feiskyer is there any existing e2e test that can be modified to validate whether networkstats is present? Maybe the summary API test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Linx, there always have network stats, but for Windows, there doesn't before this PR. After this, both Windows and Linux containers would have network stats.
/lgtm |
/approve |
/assign @yujuhong |
/hold |
/test pull-kubernetes-e2e-containerd-gce |
I'm only seeing one network now.
That matches the adapter name from the Azure-CNI logs:
Looks good! |
@feiskyer: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CRI stats path is not tested by dockershim+linux AFAIK. I can't get the presubmit containerd job to run right now, so we'll need to check if post-submit CI will be affected by this change.
Other than that, this mostly looks good with questions/comments.
The summary api test is in the node e2e suite (cannot be run by windows yet). I think we should port the test over to a regular e2e, and do some sanity check to see what metrics are present. I don't think it should block this PR, but that should be addressed afterwards.
// listContainerNetworkStats returns the network stats of all the running containers. | ||
func (p *criStatsProvider) listContainerNetworkStats() (map[string]*statsapi.NetworkStats, error) { | ||
// Always return nil for unsupported platforms. | ||
return nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the code is written right now, it's hard to tell whether nil
is expected or not. @feiskyer is there any existing e2e test that can be modified to validate whether networkstats is present? Maybe the summary API test?
go wait.Forever(func() { | ||
p.collectMetricsData(cpuCounter, memWorkingSetCounter, memCommittedBytesCounter) | ||
p.collectMetricsData(cpuCounter, memWorkingSetCounter, memCommittedBytesCounter, networkAdapterCounter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the perfCounterUpdatePeriod
(1s) still reasonable after adding more metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1s
should still work. And for network stats, there are some metrics per second. To ensure the cumulative data correctly, 1s should be used.
/retest pull-kubernetes-e2e-containerd-gce |
We haven't triggered this job for a while. Seems like it doesn't work anymore. We'll watch the post-submit CI instead. |
@PatrickLang Could you open an issue for tracking this? We should enable node e2e tests for Windows. |
ee5fd0a
to
d690037
Compare
/LGTM |
/hold cancel |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: feiskyer, michmike, yujuhong The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
What this PR does / why we need it:
Add network stats for Windows nodes and containers.
Which issue(s) this PR fixes:
Fixes #74101
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
/sig windows
/sig node
/assign @yujuhong @PatrickLang @michmike