-
Notifications
You must be signed in to change notification settings - Fork 375
Sandbox mounts aren't being cleaned up when containers fail to start #2816
Comments
I did a run with full debug logs and got some better information.
I'll do some more poking around and see what I find. |
Well, I think I found the problem. This function is being called: runtime/virtcontainers/sandbox.go Lines 1137 to 1190 in a885b1b
Container creation is failing at
and the deferred function is calling
Unfortunately, runtime/virtcontainers/sandbox.go Lines 774 to 791 in a885b1b
I'm guessing that another call performing some cleanup needs to be added to the error handling in |
I was able to stop leaks from happening by modifying the deferred function in defer func() {
// Rollback if error happens.
if err != nil {
s.Logger().Warningf("Container %q could not be created, stopping it", contConfig.ID)
if err = c.stop(false); err != nil { // Should this be a force stop?
s.Logger().WithError(err).WithField("container-id", c.id).WithField("sandboxid", s.id).Warning("Could not delete container")
}
s.Logger().WithField("container-id", c.id).WithField("sandboxid", s.id).Info("Container was stopped. Removing from sandbox store")
s.removeContainer(c.id)
}
}() I'm going to leave a pod running in a bad state for a bit and see if anything explodes. |
@evanfoster, amazing! Please, open us a pull request and I'll review and have the patch backported to the correct branches! |
Can do! Quick question, however. Should I be setting |
Yes, IMHO, we do should force it. @devimc, what do you think? |
A container that is created and added to a sandbox can still fail the final creation steps. In this case, the container must be stopped and have its resources cleaned up to prevent leaking sandbox mounts. Fixes kata-containers#2816 Signed-off-by: Evan Foster <efoster@adobe.com>
A container that is created and added to a sandbox can still fail the final creation steps. In this case, the container must be stopped and have its resources cleaned up to prevent leaking sandbox mounts. Fixes kata-containers#2816 Signed-off-by: Evan Foster <efoster@adobe.com>
A container that is created and added to a sandbox can still fail the final creation steps. In this case, the container must be stopped and have its resources cleaned up to prevent leaking sandbox mounts. Fixes kata-containers#2816 Signed-off-by: Evan Foster <efoster@adobe.com>
A container that is created and added to a sandbox can still fail the final creation steps. In this case, the container must be stopped and have its resources cleaned up to prevent leaking sandbox mounts. Fixes kata-containers#2816 Signed-off-by: Evan Foster <efoster@adobe.com>
A container that is created and added to a sandbox can still fail the final creation steps. In this case, the container must be stopped and have its resources cleaned up to prevent leaking sandbox mounts. Fixes kata-containers#2816 Signed-off-by: Evan Foster <efoster@adobe.com>
A container that is created and added to a sandbox can still fail the final creation steps. In this case, the container must be stopped and have its resources cleaned up to prevent leaking sandbox mounts. Fixes kata-containers#2816 Signed-off-by: Evan Foster <efoster@adobe.com>
A container that is created and added to a sandbox can still fail the final creation steps. In this case, the container must be stopped and have its resources cleaned up to prevent leaking sandbox mounts. Fixes kata-containers#2816 Signed-off-by: Evan Foster <efoster@adobe.com> (cherry picked from commit 337f2e0)
A container that is created and added to a sandbox can still fail the final creation steps. In this case, the container must be stopped and have its resources cleaned up to prevent leaking sandbox mounts. Fixes kata-containers#2816 Signed-off-by: Evan Foster <efoster@adobe.com> (cherry picked from commit 337f2e0)
Description of problem
When using the same setup as #2795, I found that sandbox mounts weren't being cleaned up, leading to a massive number of mountpoints (20,000 mounts in ~2 hours). For example:
I tested with @fidencio 's fix for #2719 (cri-o/cri-o#3924) but continued to have the same issue.
I'm not 100% sure, but I believe this is only an issue for containers in pods that are affected by #2795.
Expected result
Container sandboxes are cleaned up as each container is deleted.
Actual result
Sandbox mounts leak.
I have appended some interesting logs to the end of the output of
kata-collect-data.sh
.Show kata-collect-data.sh details
Meta details
Running
kata-collect-data.sh
version1.11.2-adobe (commit 9dd46e7244ec94345a3181427da818c4ae49b9a9-dirty)
at2020-07-07.19:43:40.798586627+0000
.Runtime is
/opt/kata/bin/kata-runtime
.kata-env
Output of "
/opt/kata/bin/kata-runtime kata-env
":Runtime config files
Runtime default config files
Runtime config file contents
Output of "
cat "/etc/kata-containers/configuration.toml"
":Output of "
cat "/opt/kata/share/defaults/kata-containers/configuration.toml"
":Config file
/usr/share/defaults/kata-containers/configuration.toml
not foundKSM throttler
version
Output of "
--version
":systemd service
Image details
Initrd details
No initrd
Logfiles
Runtime logs
No recent runtime problems found in system journal.
Proxy logs
No recent proxy problems found in system journal.
Shim logs
No recent shim problems found in system journal.
Throttler logs
No recent throttler problems found in system journal.
Container manager details
Have
docker
, but it's not being used. Removing this information.Have
kubectl
Kubernetes
Output of "
kubectl version
":Output of "
kubectl config view
":Output of "
systemctl show kubelet
":Have
crio
crio
Output of "
crio --version
":Output of "
systemctl show crio
":Output of "
cat /etc/crio/crio.conf
":Have
containerd
, but it's not being used. Removing this information.Packages
No
dpkg
No
rpm
Here are some interesting logs:
The text was updated successfully, but these errors were encountered: