Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

sandbox: Stop and clean up containers that fail to create #2826

Merged

Conversation

evanfoster
Copy link

A container that is created and added to a sandbox can still fail
the final creation steps. In this case, the container must be stopped
and have its resources cleaned up to prevent leaking sandbox mounts.

Fixes #2816

Signed-off-by: Evan Foster efoster@adobe.com

@auto-comment
Copy link

auto-comment bot commented Jul 13, 2020

Thank you for raising your pull request. Please note that the main development of Kata Containers has moved to the 2.0-dev branch of https://github.com/kata-containers/kata-containers repository. The kata-containers/runtime repository is kept for 1.x release maintenance. Please check twice if your change should go to the 2.0-dev branch directly.

If it is strongly required for adding the change to Kata Containers 1.x releases, please ping @kata-containers/runtime to assign a dedicated developer to be responsible for porting the change to 2.0-dev branch. Thanks!

@fidencio
Copy link
Member

/test-ubuntu

@fidencio
Copy link
Member

@evanfoster, thanks for the contribution!

Copy link
Member

@fidencio fidencio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, lgtm!

@@ -1,4 +1,5 @@
// Copyright (c) 2016 Intel Corporation
// Copyright (c) 2020 Adobe Inc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheeky =)

@evanfoster
Copy link
Author

evanfoster commented Jul 13, 2020

I'm seeing all of the sandbox mounts get cleaned up with this PR, but I do see this which is a bit concerning:

❯ kubectl delete deployment qemu-guest-empty-dir
deployment.apps "qemu-guest-empty-dir" deleted
❯ kubectl get pods
No resources found in default namespace.

# On the system
vmss-agent-kata1-test-jfitk000000 ~ # crictl pods
POD ID              CREATED             STATE               NAME                                           NAMESPACE           ATTEMPT
e6c947fa04a52       2 minutes ago       NotReady            qemu-guest-empty-dir-68ccb59f6-v7zz5           default             0
vmss-agent-kata1-test-jfitk000000 ~ # crictl rmp e6c947fa04a52
ERRO[0000] removing the pod sandbox "e6c947fa04a52" failed: rpc error: code = Unknown desc = failed to delete container k8s_POD_qemu-guest-empty-dir-68ccb59f6-v7zz5_default_dc6945b9-90f6-4c8f-bb03-37a986d09706_0 in pod sandbox e6c947fa04a52727290f7ee9bb57e25b3cab2c3fc17eeefcce06b10e5f6fb104: Could not retrieve container information
FATA[0000] unable to remove sandbox(es)

This is running with cri-o/cri-o#3924

EDIT: Some relevant CRI-O logs:

Jul 13 19:48:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:48:48.345789668Z" level=debug msg="request: &StopPodSandboxRequest{PodSandboxId:e6c947fa04a52727290f7ee9bb57e25b3cab2c3fc17eeefcce06b10e5f6fb104,}" file="go-grpc-middleware/chain.go:25" id=c0c1eeb5-cd42-4698-b004-683111c642e3
Jul 13 19:48:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:48:48.345837768Z" level=debug msg="response: &StopPodSandboxResponse{}" file="go-grpc-middleware/chain.go:25" id=c0c1eeb5-cd42-4698-b004-683111c642e3
Jul 13 19:48:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:48:48.346102869Z" level=debug msg="request: &RemovePodSandboxRequest{PodSandboxId:e6c947fa04a52727290f7ee9bb57e25b3cab2c3fc17eeefcce06b10e5f6fb104,}" file="go-grpc-middleware/chain.go:25" id=83a912a5-521a-48ed-9870-55a109d44fe9
Jul 13 19:48:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:48:48.346202270Z" level=debug msg="oci.newRuntimeVM() start" file="oci/runtime_vm.go:55"
Jul 13 19:48:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:48:48.346238470Z" level=debug msg="oci.newRuntimeVM() end" file="oci/runtime_vm.go:70"
Jul 13 19:48:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:48:48.346259770Z" level=debug msg="runtimeVM.DeleteContainer() start" file="oci/runtime_vm.go:514"
Jul 13 19:48:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:48:48.346288770Z" level=debug msg="runtimeVM.DeleteContainer() end" file="oci/runtime_vm.go:523"
Jul 13 19:48:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:48:48.346317170Z" level=debug msg="response error: failed to delete container k8s_POD_qemu-guest-empty-dir-68ccb59f6-v7zz5_default_dc6945b9-90f6-4c8f-bb03-37a986d09706_0 in pod sandbox e6c947fa04a52727290f7ee9bb57e25b3cab2c3fc17eeefcce06b10e5f6fb104: Could not retrieve container information" file="go-grpc-middleware/chain.go:25" id=83a912a5-521a-48ed-9870-55a109d44fe9
Jul 13 19:48:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:48:48.479274332Z" level=debug msg="request: &VersionRequest{Version:0.1.0,}" file="go-grpc-middleware/chain.go:25" id=c9fa6db6-ac58-4b91-822c-efeeb1a7ec74
Jul 13 19:48:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:48:48.479420433Z" level=debug msg="response: &VersionResponse{Version:0.1.0,RuntimeName:cri-o,RuntimeVersion:1.17.4,RuntimeApiVersion:v1alpha1,}" file="go-grpc-middleware/chain.go:25" id=c9fa6db6-ac58-4b91-822c-efeeb1a7ec74
...snip...
Jul 13 19:49:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:49:48.355191477Z" level=debug msg="oci.newRuntimeVM() start" file="oci/runtime_vm.go:55"
Jul 13 19:49:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:49:48.355237377Z" level=debug msg="oci.newRuntimeVM() end" file="oci/runtime_vm.go:70"
Jul 13 19:49:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:49:48.355259777Z" level=debug msg="runtimeVM.DeleteContainer() start" file="oci/runtime_vm.go:514"
Jul 13 19:49:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:49:48.355288477Z" level=debug msg="runtimeVM.DeleteContainer() end" file="oci/runtime_vm.go:523"
Jul 13 19:49:48 vmss-agent-kata1-test-jfitk000000 crio[4072913]: time="2020-07-13 19:49:48.355336577Z" level=debug msg="response error: failed to delete container k8s_POD_qemu-guest-empty-dir-68ccb59f6-v7zz5_default_dc6945b9-90f6-4c8f-bb03-37a986d09706_0 in pod sandbox e6c947fa04a52727290f7ee9bb57e25b3cab2c3fc17eeefcce06b10e5f6fb104: Could not retrieve container information" file="go-grpc-middleware/chain.go:25" id=65631c66-6883-46dc-a0f8-14bcd5de47db

EDIT 2: I was testing with an older version where force was set to false. Let me try again.

EDIT 3: I'm still having the same issue when forcing deletion.

EDIT 4: The issue was actually coming from CRI-O. PR is cri-o/cri-o#3949

fidencio added a commit to fidencio/cri-o that referenced this pull request Jul 13, 2020
While the removing the container info of the failed container from the
containers' map seems harmless, it'll end up causing an error when
actually cleaning up / deleting the failed container.

The reason for this is that CreateContainer, on Kata side, will remove
the failed container, wich will result on DeleteContainer, on the CRI-O
side, being called and then failing to actually remove the container, as
its info has already been removed from the containers' map.

This issue has been reported as part of a PR opened by Evan Foster:
kata-containers/runtime#2826 (comment)

Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Tested-by: Evan Foster <efoster@adobe.com>
virtcontainers/sandbox.go Outdated Show resolved Hide resolved
virtcontainers/sandbox.go Outdated Show resolved Hide resolved
@fidencio
Copy link
Member

@evanfoster, I ended up closing that CRI-O PR in favour of cri-o/cri-o#3956

@evanfoster evanfoster force-pushed the add-container-creation-cleanup branch from 00e0ada to d7e3957 Compare July 15, 2020 17:11
Copy link
Member

@fidencio fidencio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evanfoster, I've added one more simple comment in line.

virtcontainers/sandbox.go Outdated Show resolved Hide resolved
A container that is created and added to a sandbox can still fail
the final creation steps. In this case, the container must be stopped
and have its resources cleaned up to prevent leaking sandbox mounts.

Fixes kata-containers#2816

Signed-off-by: Evan Foster <efoster@adobe.com>
@evanfoster evanfoster force-pushed the add-container-creation-cleanup branch from d7e3957 to 337f2e0 Compare July 15, 2020 17:25
@fidencio
Copy link
Member

/test-ubuntu

@codecov
Copy link

codecov bot commented Jul 15, 2020

Codecov Report

Merging #2826 into master will increase coverage by 0.00%.
The diff coverage is 60.00%.

@@           Coverage Diff           @@
##           master    #2826   +/-   ##
=======================================
  Coverage   50.51%   50.51%           
=======================================
  Files         118      118           
  Lines       17331    17336    +5     
=======================================
+ Hits         8754     8757    +3     
- Misses       7503     7504    +1     
- Partials     1074     1075    +1     

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @evanfoster.

@fidencio fidencio merged commit 37c0ff5 into kata-containers:master Jul 16, 2020
@fidencio
Copy link
Member

@evanfoster, may I ask you two favours?
We need to back and forward port this PR.
To back port, would be possible to just cherry-pick this one against the stable-1.11 branch?
To forward port, would be possible to have this patch applied against https://github.com/kata-containers/kata-containers/?

git cherry-pick will not work for the second case, but tweaking the patch should be trivial.
Thanks a lot for the contribution! The CRI-O patch, as soon as it gets in, I'll have it backported all the way down to 1.17.

Thanks a lot for the collaboration! :-)

evanfoster pushed a commit to evanfoster/kata-containers that referenced this pull request Jul 20, 2020
A container that is created and added to a sandbox can still fail
the final creation steps. In this case, the container must be stopped
and have its resources cleaned up to prevent leaking sandbox mounts.

Forward port of kata-containers/runtime#2826

Fixes kata-containers#2816

Signed-off-by: Evan Foster <efoster@adobe.com>
evanfoster pushed a commit to evanfoster/kata-containers that referenced this pull request Jul 23, 2020
A container that is created and added to a sandbox can still fail
the final creation steps. In this case, the container must be stopped
and have its resources cleaned up to prevent leaking sandbox mounts.

Forward port of kata-containers/runtime#2826

Fixes kata-containers#2816

Signed-off-by: Evan Foster <efoster@adobe.com>
evanfoster pushed a commit to evanfoster/kata-containers that referenced this pull request Jul 24, 2020
A container that is created and added to a sandbox can still fail
the final creation steps. In this case, the container must be stopped
and have its resources cleaned up to prevent leaking sandbox mounts.

Forward port of kata-containers/runtime#2826

Fixes kata-containers#2816

Signed-off-by: Evan Foster <efoster@adobe.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sandbox mounts aren't being cleaned up when containers fail to start
4 participants