-
Notifications
You must be signed in to change notification settings - Fork 39.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding per container stats for CRI runtimes #59906
Conversation
@abhi: Adding do-not-merge/release-note-label-needed because the release note process has not been followed. One of the following labels is required "release-note", "release-note-action-required", or "release-note-none". Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
4674aa7
to
344721a
Compare
pkg/kubelet/kuberuntime/helpers.go
Outdated
@@ -207,14 +207,19 @@ func getStableKey(pod *v1.Pod, container *v1.Container) string { | |||
|
|||
// buildContainerLogsPath builds log path for container relative to pod logs directory. | |||
func buildContainerLogsPath(containerName string, restartCount int) string { | |||
return fmt.Sprintf("%s_%d.log", containerName, restartCount) | |||
return fmt.Sprintf("%s/%s_%d.log", containerName, containerName, restartCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/kubelet/kuberuntime/helpers.go
Outdated
@@ -207,14 +207,19 @@ func getStableKey(pod *v1.Pod, container *v1.Container) string { | |||
|
|||
// buildContainerLogsPath builds log path for container relative to pod logs directory. | |||
func buildContainerLogsPath(containerName string, restartCount int) string { | |||
return fmt.Sprintf("%s_%d.log", containerName, restartCount) | |||
return fmt.Sprintf("%s/%s_%d.log", containerName, containerName, restartCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep consistent filepath.Join
@@ -190,6 +190,11 @@ func (m *kubeGenericRuntimeManager) generateContainerConfig(container *v1.Contai | |||
} | |||
|
|||
command, args := kubecontainer.ExpandContainerCommandAndArgs(container, opts.Envs) | |||
logDir := fmt.Sprintf("%s/%s", buildPodLogsDirectory(pod.UID), container.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use your helper function. BuildContainerLogsDirectory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this directory when remove container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. Will do
@@ -61,7 +61,7 @@ func TestRemoveContainer(t *testing.T) { | |||
err = m.removeContainer(containerId) | |||
assert.NoError(t, err) | |||
// Verify container log is removed | |||
expectedContainerLogPath := filepath.Join(podLogsRootDirectory, "12345678", "foo_0.log") | |||
expectedContainerLogPath := filepath.Join(podLogsRootDirectory, "12345678", "/foo/foo_0.log") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Join
instead of /
.
"foo", "0.log"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
m := volume.NewMetricsDu(path) | ||
logMetrics, err := m.GetMetrics() | ||
if err != nil { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -423,3 +415,26 @@ func getCRICadvisorStats(ca cadvisor.Interface) (map[string]cadvisorapiv2.Contai | |||
} | |||
return stats, nil | |||
} | |||
|
|||
func getContainerLogStats(path string) *statsapi.FsStats { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add TODO
, cache this in container log manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
func makeFakeLogs(sandbox *critest.FakePodSandbox, containers ...string) { | ||
for _, name := range containers { | ||
os.MkdirAll(fmt.Sprintf("/var/log/pods/%s/%s/", types.UID(sandbox.GetMetadata().GetUid()), name), 755) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use your helper function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
func deleteFakeLogs(sandbox *critest.FakePodSandbox) { | ||
os.RemoveAll(fmt.Sprintf("/var/log/pods/%s", types.UID(sandbox.GetMetadata().GetUid()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
|
||
func makeFakeLogs(sandbox *critest.FakePodSandbox, containers ...string) { | ||
for _, name := range containers { | ||
os.MkdirAll(fmt.Sprintf("/var/log/pods/%s/%s/", types.UID(sandbox.GetMetadata().GetUid()), name), 755) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/var/log/pods
is actual pod/container log directory. We should not create files there in unit test... We may ruin people's local environment...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep went mock approach
@@ -423,3 +415,26 @@ func getCRICadvisorStats(ca cadvisor.Interface) (map[string]cadvisorapiv2.Contai | |||
} | |||
return stats, nil | |||
} | |||
|
|||
func getContainerLogStats(path string) *statsapi.FsStats { | |||
m := volume.NewMetricsDu(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewMetricsDu
returns an interface MetricsProvider
. It should be easy to mock.
Let's use a mock instead of using actual log stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@Random-Liu will update the PR and address the review comments. Thanks for reviewing |
@abhi: Adding do-not-merge/release-note-label-needed because the release note process has not been followed. One of the following labels is required "release-note", "release-note-action-required", or "release-note-none". Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
820c3ed
to
24a2d33
Compare
assert.Equal(*rootFsInfo.Inodes, *actual.Logs.Inodes) | ||
assert.Nil(actual.Logs.UsedBytes) | ||
assert.Nil(actual.Logs.InodesUsed) | ||
func checkCRILogsStats(assert *assert.Assertions, actual statsapi.ContainerStats) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare the value for rootFsInfo
and log stats. That's the point of the fake, right? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
func makeFakeLogStats(seed int) *volume.Metrics { | ||
m := &volume.Metrics{} | ||
m.Used = resource.NewQuantity(int64(0), resource.BinarySI) //seed + offsetUsage),resource.BinarySI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? comment not removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -190,6 +190,11 @@ func (m *kubeGenericRuntimeManager) generateContainerConfig(container *v1.Contai | |||
} | |||
|
|||
command, args := kubecontainer.ExpandContainerCommandAndArgs(container, opts.Envs) | |||
logDir := BuildContainerLogsDirectory(kubetypes.UID(pod.UID), container.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the directory in removeContainerLog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
This commit aims to collect per container log stats. The change was proposed as a part of kubernetes#55905. The change includes change of the log path from /var/pod/<pod uid>/containername_attempt.log to /var/pod/<pod uid>/containername/containername_attempt.log. The logs are collected by reusing volume package to collect metrics from the log path. Signed-off-by: abhi <abhi@docker.com>
7f861d4
to
42b9205
Compare
@@ -840,8 +845,7 @@ func (m *kubeGenericRuntimeManager) removeContainerLog(containerID string) error | |||
return fmt.Errorf("failed to get container status %q: %v", containerID, err) | |||
} | |||
labeledInfo := getContainerInfoFromLabels(status.Labels) | |||
annotatedInfo := getContainerInfoFromAnnotations(status.Annotations) | |||
path := buildFullContainerLogsPath(labeledInfo.PodUID, labeledInfo.ContainerName, annotatedInfo.RestartCount) | |||
path := BuildContainerLogsDirectory(labeledInfo.PodUID, labeledInfo.ContainerName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry... This is my fault... We can't remove the whole directory... There might be newer instance of container log with different restart count in this directory. Let's leave the container log directory to be deleted when pod deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep :) Cool will update
The commit contains test case modifications to test and verify changes for container log stats feature. Signed-off-by: abhi <abhi@docker.com>
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhi, Random-Liu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the change is not tied to any platform, volume.MetricsProvider doesn't support windows yet. Will open a new issue for windows.
LGTM
Automatic merge from submit-queue (batch tested with PRs 54191, 59374, 59824, 55032, 59906). If you want to cherry-pick this change to another branch, please follow the instructions here. |
@abhi: Adding do-not-merge/release-note-label-needed because the release note process has not been followed. One of the following labels is required "release-note", "release-note-action-required", or "release-note-none". Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Automatic merge from submit-queue (batch tested with PRs 63603, 63557, 62015). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. CRI: update documents for container logpath **What this PR does / why we need it**: The container log path has been changed from `containername_attempt#.log` to `containername/attempt#.log` in #59906. This PR updates CRI documents for it. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes # **Special notes for your reviewer**: **Release note**: ```release-note CRI: update documents for container logpath. The container log path has been changed from containername_attempt#.log to containername/attempt#.log ```
This change broke retrieval of logs for pods that were started on a <1.10 cluster that got upgraded to 1.10, even though this has nothing to do with CRI:
I'm sorry for the bluntness but how the hell can such a change get merged without any regard for backwards compatibility? |
@andor44 Are you using cri-o? |
No, Docker 1.12. I worded my original comment wrongly, by "has nothing to do with CRI" I meant we don't use a CRI runtime. |
@andor44 When upgrading from 1.9 to 1.10, you will have this issue. I'll say this is expected, because:
We can cherry-pick #63268 to 1.10 for this, because the fix is small. @yujuhong Are you ok with that? But please note that in-place kubelet may break again in the future, because it is never supported. :) |
@Random-Liu understandable, but it seems that point 1 is undocumented, or at least I can't find anything in the docs that would indicate that the kubelet does not support in-place upgrades. I know that a lack of warning does not indicate support, but lots of online guides and how-tos tell you to do just that. Given that other components deal with in-place upgrades just fine, it is not a big mental leap to assume kubelet does too. If the kubelet really doesn't support it and such breaks are to be expected it should be well documented IMO. FWIW, the cluster in question has been upgraded from 1.5 all the way to 1.10 over the past 2 years or so, and this is the first time we ran into some serious incompatibility/issue.
|
I have to say that I agree with @andor44 here. We probably made assumption based, at least partially, on the lack of documentation or the presence of the contrary and our experience related to that. I knew that on GKE for example the node is drained but this could have been for a cloud specific issue/best practice. For us, on bare metal, it worked all the times. I kindly suggest to explicitly put this in the documentation. Thanks for the effort done in cherry picking the fix. |
@andor44 @lucaim I agree that the release note should call out issues that may lead to container restarts (or other problems among kubelet upgrade) if possible. This is our oversight and sorry for the trouble. AFAIK, the kubelet in-place upgrade path is not tested, so these issues may not be caught and well-documented before release. Just like the stackoverflow question @andor44 mentioned, container restarts also happened in 1.7 to 1.8 upgrade as well. If you use pod manifest files on the node, they'd encounter upgrade issues even more frequently (arguably by design). Given the current situation, I'd recommend draining the node to avoid unexpected issues. |
Agreed. Since this particular issue has been mitigated, let's address those user-facing issues clearly in the future on both PR's changelog and |
For kubernetes/enhancements#552.
What this PR does / why we need it
This commit aims to collect per container log stats. The change was proposed as a part of #55905. The change includes change the log path from /var/pod//containername_attempt.log to /var/pod//containername/containername_attempt.log. The logs are collected by reusing volume package to collect metrics from the log path.
Fixes #55905
Special notes for your reviewer:
cc @Random-Liu
Release note: