-
Notifications
You must be signed in to change notification settings - Fork 375
/usr/bin/containerd-shim-kata-v2 processes persist and consume heavy CPU after corresponding containers exit #2719
/usr/bin/containerd-shim-kata-v2 processes persist and consume heavy CPU after corresponding containers exit #2719
Comments
Thanks, I could noticed that when a pod using shimv2 is stopped, it does take several seconds for the shimv2 process to disappear. I'll take a look at this one in the next few days. |
After a while, the load drops, but the processes do not go away. |
This is many minutes. |
The containerd-shim processes only start consuming a lot of CPU when the underlying containers exit (with this test, all of the containers exit more or less simultaneously). |
Running
continuously. |
Turns out this seems to be an issue on CRI-O. I'm sending them a PR and I'd appreciate if you could give it a try to ensure it does solve your issue. |
When shutting the container down, we're dealing with the following piece of code on Kata side: https://github.com/kata-containers/runtime/blob/master/containerd-shim-v2/service.go#L785 ``` func (s *service) Shutdown(ctx context.Context, r *taskAPI.ShutdownRequest) (_ *ptypes.Empty, err error) { defer func() { err = toGRPC(err) }() s.mu.Lock() if len(s.containers) != 0 { s.mu.Unlock() return empty, nil } s.mu.Unlock() s.cancel() os.Exit(0) // This will never be called, but this is only there to make sure the // program can compile. return empty, nil } ``` The code shown above will simply stop the service, closing the ttrpc channel, raising then the "ErrClosed" error, which is returned by the Shutdown. Differently from containerd code, which simply igores the error, CRI-O propagates the error, leaving a bunch of processes behind that will never ever be closed. Here's what containerd does: https://github.com/containerd/containerd/blob/master/runtime/v2/shim.go#L194 ``` _, err := s.task.Shutdown(ctx, &task.ShutdownRequest{ ID: s.ID(), }) if err != nil && !errors.Is(err, ttrpc.ErrClosed) { return errdefs.FromGRPC(err) } ``` Knowing that, let's mimic what's been done by containerd and ignore the error in this specific case. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
When shutting the container down, we're dealing with the following piece of code on Kata side: https://github.com/kata-containers/runtime/blob/master/containerd-shim-v2/service.go#L785 ``` func (s *service) Shutdown(ctx context.Context, r *taskAPI.ShutdownRequest) (_ *ptypes.Empty, err error) { defer func() { err = toGRPC(err) }() s.mu.Lock() if len(s.containers) != 0 { s.mu.Unlock() return empty, nil } s.mu.Unlock() s.cancel() os.Exit(0) // This will never be called, but this is only there to make sure the // program can compile. return empty, nil } ``` The code shown above will simply stop the service, closing the ttrpc channel, raising then the "ErrClosed" error, which is returned by the Shutdown. Differently from containerd code, which simply igores the error, CRI-O propagates the error, leaving a bunch of processes behind that will never ever be closed. Here's what containerd does: https://github.com/containerd/containerd/blob/master/runtime/v2/shim.go#L194 ``` _, err := s.task.Shutdown(ctx, &task.ShutdownRequest{ ID: s.ID(), }) if err != nil && !errors.Is(err, ttrpc.ErrClosed) { return errdefs.FromGRPC(err) } ``` Knowing that, let's mimic what's been done by containerd and ignore the error in this specific case. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
When shutting the container down, we're dealing with the following piece of code on Kata side: https://github.com/kata-containers/runtime/blob/master/containerd-shim-v2/service.go#L785 ``` func (s *service) Shutdown(ctx context.Context, r *taskAPI.ShutdownRequest) (_ *ptypes.Empty, err error) { defer func() { err = toGRPC(err) }() s.mu.Lock() if len(s.containers) != 0 { s.mu.Unlock() return empty, nil } s.mu.Unlock() s.cancel() os.Exit(0) // This will never be called, but this is only there to make sure the // program can compile. return empty, nil } ``` The code shown above will simply stop the service, closing the ttrpc channel, raising then the "ErrClosed" error, which is returned by the Shutdown. Differently from containerd code, which simply igores the error, CRI-O propagates the error, leaving a bunch of processes behind that will never ever be closed. Here's what containerd does: https://github.com/containerd/containerd/blob/master/runtime/v2/shim.go#L194 ``` _, err := s.task.Shutdown(ctx, &task.ShutdownRequest{ ID: s.ID(), }) if err != nil && !errors.Is(err, ttrpc.ErrClosed) { return errdefs.FromGRPC(err) } ``` Knowing that, let's mimic what's been done by containerd and ignore the error in this specific case. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
When shutting the container down, we're dealing with the following piece of code on Kata side: https://github.com/kata-containers/runtime/blob/master/containerd-shim-v2/service.go#L785 ``` func (s *service) Shutdown(ctx context.Context, r *taskAPI.ShutdownRequest) (_ *ptypes.Empty, err error) { defer func() { err = toGRPC(err) }() s.mu.Lock() if len(s.containers) != 0 { s.mu.Unlock() return empty, nil } s.mu.Unlock() s.cancel() os.Exit(0) // This will never be called, but this is only there to make sure the // program can compile. return empty, nil } ``` The code shown above will simply stop the service, closing the ttrpc channel, raising then the "ErrClosed" error, which is returned by the Shutdown. Differently from containerd code, which simply igores the error, CRI-O propagates the error, leaving a bunch of processes behind that will never ever be closed. Here's what containerd does: https://github.com/containerd/containerd/blob/master/runtime/v2/shim.go#L194 ``` _, err := s.task.Shutdown(ctx, &task.ShutdownRequest{ ID: s.ID(), }) if err != nil && !errors.Is(err, ttrpc.ErrClosed) { return errdefs.FromGRPC(err) } ``` Knowing that, let's mimic what's been done by containerd and ignore the error in this specific case. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
When shutting the container down, we're dealing with the following piece of code on Kata side: https://github.com/kata-containers/runtime/blob/master/containerd-shim-v2/service.go#L785 ``` func (s *service) Shutdown(ctx context.Context, r *taskAPI.ShutdownRequest) (_ *ptypes.Empty, err error) { defer func() { err = toGRPC(err) }() s.mu.Lock() if len(s.containers) != 0 { s.mu.Unlock() return empty, nil } s.mu.Unlock() s.cancel() os.Exit(0) // This will never be called, but this is only there to make sure the // program can compile. return empty, nil } ``` The code shown above will simply stop the service, closing the ttrpc channel, raising then the "ErrClosed" error, which is returned by the Shutdown. Differently from containerd code, which simply igores the error, CRI-O propagates the error, leaving a bunch of processes behind that will never ever be closed. Here's what containerd does: https://github.com/containerd/containerd/blob/master/runtime/v2/shim.go#L194 ``` _, err := s.task.Shutdown(ctx, &task.ShutdownRequest{ ID: s.ID(), }) if err != nil && !errors.Is(err, ttrpc.ErrClosed) { return errdefs.FromGRPC(err) } ``` Knowing that, let's mimic what's been done by containerd and ignore the error in this specific case. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
When shutting the container down, we're dealing with the following piece of code on Kata side: https://github.com/kata-containers/runtime/blob/master/containerd-shim-v2/service.go#L785 ``` func (s *service) Shutdown(ctx context.Context, r *taskAPI.ShutdownRequest) (_ *ptypes.Empty, err error) { defer func() { err = toGRPC(err) }() s.mu.Lock() if len(s.containers) != 0 { s.mu.Unlock() return empty, nil } s.mu.Unlock() s.cancel() os.Exit(0) // This will never be called, but this is only there to make sure the // program can compile. return empty, nil } ``` The code shown above will simply stop the service, closing the ttrpc channel, raising then the "ErrClosed" error, which is returned by the Shutdown. Differently from containerd code, which simply igores the error, CRI-O propagates the error, leaving a bunch of processes behind that will never ever be closed. Here's what containerd does: https://github.com/containerd/containerd/blob/master/runtime/v2/shim.go#L194 ``` _, err := s.task.Shutdown(ctx, &task.ShutdownRequest{ ID: s.ID(), }) if err != nil && !errors.Is(err, ttrpc.ErrClosed) { return errdefs.FromGRPC(err) } ``` Knowing that, let's mimic what's been done by containerd and ignore the error in this specific case. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
When shutting the container down, we're dealing with the following piece of code on Kata side: https://github.com/kata-containers/runtime/blob/master/containerd-shim-v2/service.go#L785 ``` func (s *service) Shutdown(ctx context.Context, r *taskAPI.ShutdownRequest) (_ *ptypes.Empty, err error) { defer func() { err = toGRPC(err) }() s.mu.Lock() if len(s.containers) != 0 { s.mu.Unlock() return empty, nil } s.mu.Unlock() s.cancel() os.Exit(0) // This will never be called, but this is only there to make sure the // program can compile. return empty, nil } ``` The code shown above will simply stop the service, closing the ttrpc channel, raising then the "ErrClosed" error, which is returned by the Shutdown. Differently from containerd code, which simply igores the error, CRI-O propagates the error, leaving a bunch of processes behind that will never ever be closed. Here's what containerd does: https://github.com/containerd/containerd/blob/master/runtime/v2/shim.go#L194 ``` _, err := s.task.Shutdown(ctx, &task.ShutdownRequest{ ID: s.ID(), }) if err != nil && !errors.Is(err, ttrpc.ErrClosed) { return errdefs.FromGRPC(err) } ``` Knowing that, let's mimic what's been done by containerd and ignore the error in this specific case. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
When shutting the container down, we're dealing with the following piece of code on Kata side: https://github.com/kata-containers/runtime/blob/master/containerd-shim-v2/service.go#L785 ``` func (s *service) Shutdown(ctx context.Context, r *taskAPI.ShutdownRequest) (_ *ptypes.Empty, err error) { defer func() { err = toGRPC(err) }() s.mu.Lock() if len(s.containers) != 0 { s.mu.Unlock() return empty, nil } s.mu.Unlock() s.cancel() os.Exit(0) // This will never be called, but this is only there to make sure the // program can compile. return empty, nil } ``` The code shown above will simply stop the service, closing the ttrpc channel, raising then the "ErrClosed" error, which is returned by the Shutdown. Differently from containerd code, which simply igores the error, CRI-O propagates the error, leaving a bunch of processes behind that will never ever be closed. Here's what containerd does: https://github.com/containerd/containerd/blob/master/runtime/v2/shim.go#L194 ``` _, err := s.task.Shutdown(ctx, &task.ShutdownRequest{ ID: s.ID(), }) if err != nil && !errors.Is(err, ttrpc.ErrClosed) { return errdefs.FromGRPC(err) } ``` Knowing that, let's mimic what's been done by containerd and ignore the error in this specific case. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
When the container finishes its execution a containerd-shim-kata-v2 process is left behind (only when using CRI-O). The reason for that seems to be CRI-O not doing a cleanup of the process whenever the container state has changed its state from running to stopped. The most reasonable way found to perform such cleanup seems to be taking advantage of the goroutine used to update the container status and performing the cleanup there, whenever it's needed. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
When the container finishes its execution a containerd-shim-kata-v2 process is left behind (only when using CRI-O). The reason for that seems to be CRI-O not doing a cleanup of the process whenever the container state has changed its state from running to stopped. The most reasonable way found to perform such cleanup seems to be taking advantage of the goroutine used to update the container status and performing the cleanup there, whenever it's needed. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
When the container finishes its execution a containerd-shim-kata-v2 process is left behind (only when using CRI-O). The reason for that seems to be CRI-O not doing a cleanup of the process whenever the container state has changed its state from running to stopped. The most reasonable way found to perform such cleanup seems to be taking advantage of the goroutine used to update the container status and performing the cleanup there, whenever it's needed. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
When the container finishes its execution a containerd-shim-kata-v2 process is left behind (only when using CRI-O). The reason for that seems to be CRI-O not doing a cleanup of the process whenever the container state has changed its state from running to stopped. The most reasonable way found to perform such cleanup seems to be taking advantage of the goroutine used to update the container status and performing the cleanup there, whenever it's needed. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
When the container finishes its execution a containerd-shim-kata-v2 process is left behind (only when using CRI-O). The reason for that seems to be CRI-O not doing a cleanup of the process whenever the container state has changed its state from running to stopped. The most reasonable way found to perform such cleanup seems to be taking advantage of the goroutine used to update the container status and performing the cleanup there, whenever it's needed. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
I'm closing this one as all the patches ended up being merged on CRI-O. |
When the container finishes its execution a containerd-shim-kata-v2 process is left behind (only when using CRI-O). The reason for that seems to be CRI-O not doing a cleanup of the process whenever the container state has changed its state from running to stopped. The most reasonable way found to perform such cleanup seems to be taking advantage of the goroutine used to update the container status and performing the cleanup there, whenever it's needed. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com> (cherry picked from commit 814c1bb)
When shutting the container down, we're dealing with the following piece of code on Kata side: https://github.com/kata-containers/runtime/blob/master/containerd-shim-v2/service.go#L785 ``` func (s *service) Shutdown(ctx context.Context, r *taskAPI.ShutdownRequest) (_ *ptypes.Empty, err error) { defer func() { err = toGRPC(err) }() s.mu.Lock() if len(s.containers) != 0 { s.mu.Unlock() return empty, nil } s.mu.Unlock() s.cancel() os.Exit(0) // This will never be called, but this is only there to make sure the // program can compile. return empty, nil } ``` The code shown above will simply stop the service, closing the ttrpc channel, raising then the "ErrClosed" error, which is returned by the Shutdown. Differently from containerd code, which simply igores the error, CRI-O propagates the error, leaving a bunch of processes behind that will never ever be closed. Here's what containerd does: https://github.com/containerd/containerd/blob/master/runtime/v2/shim.go#L194 ``` _, err := s.task.Shutdown(ctx, &task.ShutdownRequest{ ID: s.ID(), }) if err != nil && !errors.Is(err, ttrpc.ErrClosed) { return errdefs.FromGRPC(err) } ``` Knowing that, let's mimic what's been done by containerd and ignore the error in this specific case. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com> (cherry picked from commit 45b778d)
When the container finishes its execution a containerd-shim-kata-v2 process is left behind (only when using CRI-O). The reason for that seems to be CRI-O not doing a cleanup of the process whenever the container state has changed its state from running to stopped. The most reasonable way found to perform such cleanup seems to be taking advantage of the goroutine used to update the container status and performing the cleanup there, whenever it's needed. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com> (cherry picked from commit 814c1bb) (cherry picked from commit adb657c)
When shutting the container down, we're dealing with the following piece of code on Kata side: https://github.com/kata-containers/runtime/blob/master/containerd-shim-v2/service.go#L785 ``` func (s *service) Shutdown(ctx context.Context, r *taskAPI.ShutdownRequest) (_ *ptypes.Empty, err error) { defer func() { err = toGRPC(err) }() s.mu.Lock() if len(s.containers) != 0 { s.mu.Unlock() return empty, nil } s.mu.Unlock() s.cancel() os.Exit(0) // This will never be called, but this is only there to make sure the // program can compile. return empty, nil } ``` The code shown above will simply stop the service, closing the ttrpc channel, raising then the "ErrClosed" error, which is returned by the Shutdown. Differently from containerd code, which simply igores the error, CRI-O propagates the error, leaving a bunch of processes behind that will never ever be closed. Here's what containerd does: https://github.com/containerd/containerd/blob/master/runtime/v2/shim.go#L194 ``` _, err := s.task.Shutdown(ctx, &task.ShutdownRequest{ ID: s.ID(), }) if err != nil && !errors.Is(err, ttrpc.ErrClosed) { return errdefs.FromGRPC(err) } ``` Knowing that, let's mimic what's been done by containerd and ignore the error in this specific case. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com> (cherry picked from commit 45b778d)
When the container finishes its execution a containerd-shim-kata-v2 process is left behind (only when using CRI-O). The reason for that seems to be CRI-O not doing a cleanup of the process whenever the container state has changed its state from running to stopped. The most reasonable way found to perform such cleanup seems to be taking advantage of the goroutine used to update the container status and performing the cleanup there, whenever it's needed. Related: kata-containers/runtime#2719 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com> (cherry picked from commit 814c1bb) (cherry picked from commit adb657c)
fidencio/cri-o@345c016 is leading to problems where we remove the container on a successful container exit, and the kubelet tries to remove the container (but it doesn't exist in the ctrs map) |
Do we know why the shim uses so much CPU when the containers die? I wonder if it's busy-looping. |
When a pod using the VM runtime type stops, the actual runtime process should also be stopped. Previously, the runtime process was killed when the pod was deleted. This works well for many workloads, but causes process leaks when large numbers of one-shot pods are created (e.g. pods that enter the "Completed" state in Kubernetes). Those pods will eventually be cleaned up, but until then a large number of runtime processes will hang around. The situation is made worse when the runtime process enters a bad state after its container dies (see kata-containers/runtime#2719). Initially, this problem was addressed in cri-o#3998 However, that PR worked by actually deleting VM runtime containers on stop, which led to issues with pods being stuck in a NotReady state indefinitely. This PR re-addresses the issue solved by 3998 by sending a shutdown task to the runtime on pod stop. Signed-off-by: Evan Foster <efoster@adobe.com>
When a pod using the VM runtime type stops, the actual runtime process should also be stopped. Previously, the runtime process was killed when the pod was deleted. This works well for many workloads, but causes process leaks when large numbers of one-shot pods are created (e.g. pods that enter the "Completed" state in Kubernetes). Those pods will eventually be cleaned up, but until then a large number of runtime processes will hang around. The situation is made worse when the runtime process enters a bad state after its container dies (see kata-containers/runtime#2719). Initially, this problem was addressed in cri-o#3998 However, that PR worked by actually deleting VM runtime containers on stop, which led to issues with pods being stuck in a NotReady state indefinitely. This PR re-addresses the issue solved by 3998 by sending a shutdown task to the runtime on pod stop. Signed-off-by: Evan Foster <efoster@adobe.com>
When a pod using the VM runtime type stops, the actual runtime process should also be stopped. Previously, the runtime process was killed when the pod was deleted. This works well for many workloads, but causes process leaks when large numbers of one-shot pods are created (e.g. pods that enter the "Completed" state in Kubernetes). Those pods will eventually be cleaned up, but until then a large number of runtime processes will hang around. The situation is made worse when the runtime process enters a bad state after its container dies (see kata-containers/runtime#2719). Initially, this problem was addressed in cri-o#3998 However, that PR worked by actually deleting VM runtime containers on stop, which led to issues with pods being stuck in a NotReady state indefinitely. This PR re-addresses the issue solved by 3998 by sending a shutdown task to the runtime on pod stop. Signed-off-by: Evan Foster <efoster@adobe.com>
When a pod using the VM runtime type stops, the actual runtime process should also be stopped. Previously, the runtime process was killed when the pod was deleted. This works well for many workloads, but causes process leaks when large numbers of one-shot pods are created (e.g. pods that enter the "Completed" state in Kubernetes). Those pods will eventually be cleaned up, but until then a large number of runtime processes will hang around. The situation is made worse when the runtime process enters a bad state after its container dies (see kata-containers/runtime#2719). Initially, this problem was addressed in cri-o#3998 However, that PR worked by actually deleting VM runtime containers on stop, which led to issues with pods being stuck in a NotReady state indefinitely. This PR re-addresses the issue solved by 3998 by sending a shutdown task to the runtime on pod stop. Signed-off-by: Evan Foster <efoster@adobe.com>
When a pod using the VM runtime type stops, the actual runtime process should also be stopped. Previously, the runtime process was killed when the pod was deleted. This works well for many workloads, but causes process leaks when large numbers of one-shot pods are created (e.g. pods that enter the "Completed" state in Kubernetes). Those pods will eventually be cleaned up, but until then a large number of runtime processes will hang around. The situation is made worse when the runtime process enters a bad state after its container dies (see kata-containers/runtime#2719). Initially, this problem was addressed in cri-o#3998 However, that PR worked by actually deleting VM runtime containers on stop, which led to issues with pods being stuck in a NotReady state indefinitely. This PR re-addresses the issue solved by 3998 by sending a shutdown task to the runtime on pod stop. Signed-off-by: Evan Foster <efoster@adobe.com>
When a pod using the VM runtime type stops, the actual runtime process should also be stopped. Previously, the runtime process was killed when the pod was deleted. This works well for many workloads, but causes process leaks when large numbers of one-shot pods are created (e.g. pods that enter the "Completed" state in Kubernetes). Those pods will eventually be cleaned up, but until then a large number of runtime processes will hang around. The situation is made worse when the runtime process enters a bad state after its container dies (see kata-containers/runtime#2719). Initially, this problem was addressed in cri-o#3998 However, that PR worked by actually deleting VM runtime containers on stop, which led to issues with pods being stuck in a NotReady state indefinitely. This PR re-addresses the issue solved by 3998 by sending a shutdown task to the runtime on pod stop. Signed-off-by: Evan Foster <efoster@adobe.com>
When a pod using the VM runtime type stops, the actual runtime process should also be stopped. Previously, the runtime process was killed when the pod was deleted. This works well for many workloads, but causes process leaks when large numbers of one-shot pods are created (e.g. pods that enter the "Completed" state in Kubernetes). Those pods will eventually be cleaned up, but until then a large number of runtime processes will hang around. The situation is made worse when the runtime process enters a bad state after its container dies (see kata-containers/runtime#2719). Initially, this problem was addressed in cri-o#3998 However, that PR worked by actually deleting VM runtime containers on stop, which led to issues with pods being stuck in a NotReady state indefinitely. This PR re-addresses the issue solved by 3998 by sending a shutdown task to the runtime on pod stop. Signed-off-by: Evan Foster <efoster@adobe.com>
When a pod using the VM runtime type stops, the actual runtime process should also be stopped. Previously, the runtime process was killed when the pod was deleted. This works well for many workloads, but causes process leaks when large numbers of one-shot pods are created (e.g. pods that enter the "Completed" state in Kubernetes). Those pods will eventually be cleaned up, but until then a large number of runtime processes will hang around. The situation is made worse when the runtime process enters a bad state after its container dies (see kata-containers/runtime#2719). Initially, this problem was addressed in cri-o#3998 However, that PR worked by actually deleting VM runtime containers on stop, which led to issues with pods being stuck in a NotReady state indefinitely. This PR re-addresses the issue solved by 3998 by sending a shutdown task to the runtime on pod stop. Signed-off-by: Evan Foster <efoster@adobe.com>
When a one-shot pod dies in CRI-O, the shimv2 process isn't killed until the pod is actually deleted, even though the VM is shut down. In this case, the shim appears to busyloop when attempting to talk to the (now dead) agent via VSOCK. To address this, we disconnect from the agent after the VM is shut down. This is especially catastrophic for one-shot pods that may persist for hours or days, but it also applies to any shimv2 pod where Kata is configured to use VSOCK for communication. Fixes github.com/kata-containers/runtime#2719 Signed-off-by: Evan Foster <efoster@adobe.com>
When a one-shot pod dies in CRI-O, the shimv2 process isn't killed until the pod is actually deleted, even though the VM is shut down. In this case, the shim appears to busyloop when attempting to talk to the (now dead) agent via VSOCK. To address this, we disconnect from the agent after the VM is shut down. This is especially catastrophic for one-shot pods that may persist for hours or days, but it also applies to any shimv2 pod where Kata is configured to use VSOCK for communication. See github.com/kata-containers/runtime#2719 for details. Fixes kata-containers#2719 Signed-off-by: Evan Foster <efoster@adobe.com>
It was basically busy looping when trying to talk to the agent in the VM that's been deleted. Assuming it passes review, I'll be backporting kata-containers/kata-containers#556 to Kata 1.X. For fun, here's the call graph from profiling: EDIT: I should also note that I think this is the same problem reported in #1917, but focused on shutdown instead of startup. |
When a one-shot pod dies in CRI-O, the shimv2 process isn't killed until the pod is actually deleted, even though the VM is shut down. In this case, the shim appears to busyloop when attempting to talk to the (now dead) agent via VSOCK. To address this, we disconnect from the agent after the VM is shut down. This is especially catastrophic for one-shot pods that may persist for hours or days, but it also applies to any shimv2 pod where Kata is configured to use VSOCK for communication. See github.com/kata-containers/runtime#2719 for details. Fixes kata-containers#2719 Signed-off-by: Evan Foster <efoster@adobe.com>
When a one-shot pod dies in CRI-O, the shimv2 process isn't killed until the pod is actually deleted, even though the VM is shut down. In this case, the shim appears to busyloop when attempting to talk to the (now dead) agent via VSOCK. To address this, we disconnect from the agent after the VM is shut down. This is especially catastrophic for one-shot pods that may persist for hours or days, but it also applies to any shimv2 pod where Kata is configured to use VSOCK for communication. See github.com/kata-containers/runtime#2719 for details. Fixes kata-containers#2719 Signed-off-by: Evan Foster <efoster@adobe.com>
When a one-shot pod dies in CRI-O, the shimv2 process isn't killed until the pod is actually deleted, even though the VM is shut down. In this case, the shim appears to busyloop when attempting to talk to the (now dead) agent via VSOCK. To address this, we disconnect from the agent after the VM is shut down. This is especially catastrophic for one-shot pods that may persist for hours or days, but it also applies to any shimv2 pod where Kata is configured to use VSOCK for communication. Backport of kata-containers/kata-containers#556 to kata-containers/runtime master branch. See github.com/kata-containers#2719 for details. Fixes kata-containers#2719 Signed-off-by: Evan Foster <efoster@adobe.com>
When a one-shot pod dies in CRI-O, the shimv2 process isn't killed until the pod is actually deleted, even though the VM is shut down. In this case, the shim appears to busyloop when attempting to talk to the (now dead) agent via VSOCK. To address this, we disconnect from the agent after the VM is shut down. This is especially catastrophic for one-shot pods that may persist for hours or days, but it also applies to any shimv2 pod where Kata is configured to use VSOCK for communication. Backport of kata-containers/kata-containers#556 to kata-containers/runtime 1.11 branch. Fixes kata-containers#2719 Signed-off-by: Evan Foster <efoster@adobe.com> (cherry picked from commit 227cba6)
When a one-shot pod dies in CRI-O, the shimv2 process isn't killed until the pod is actually deleted, even though the VM is shut down. In this case, the shim appears to busyloop when attempting to talk to the (now dead) agent via VSOCK. To address this, we disconnect from the agent after the VM is shut down. This is especially catastrophic for one-shot pods that may persist for hours or days, but it also applies to any shimv2 pod where Kata is configured to use VSOCK for communication. Backport of kata-containers/kata-containers#556 to kata-containers/runtime 1.11 branch. Fixes kata-containers#2719 Signed-off-by: Evan Foster <efoster@adobe.com> (cherry picked from commit 227cba6)
When a one-shot pod dies in CRI-O, the shimv2 process isn't killed until the pod is actually deleted, even though the VM is shut down. In this case, the shim appears to busyloop when attempting to talk to the (now dead) agent via VSOCK. To address this, we disconnect from the agent after the VM is shut down. This is especially catastrophic for one-shot pods that may persist for hours or days, but it also applies to any shimv2 pod where Kata is configured to use VSOCK for communication. Backport of kata-containers/kata-containers#556 to kata-containers/runtime master branch. See github.com/kata-containers#2719 for details. Fixes kata-containers#2719 Signed-off-by: Evan Foster <efoster@adobe.com>
When a one-shot pod dies in CRI-O, the shimv2 process isn't killed until the pod is actually deleted, even though the VM is shut down. In this case, the shim appears to busyloop when attempting to talk to the (now dead) agent via VSOCK. To address this, we disconnect from the agent after the VM is shut down. This is especially catastrophic for one-shot pods that may persist for hours or days, but it also applies to any shimv2 pod where Kata is configured to use VSOCK for communication. Backport of kata-containers/kata-containers#556 to kata-containers/runtime master branch. See github.com/kata-containers#2719 for details. Fixes kata-containers#2719 Signed-off-by: Evan Foster <efoster@adobe.com>
When a one-shot pod dies in CRI-O, the shimv2 process isn't killed until the pod is actually deleted, even though the VM is shut down. In this case, the shim appears to busyloop when attempting to talk to the (now dead) agent via VSOCK. To address this, we disconnect from the agent after the VM is shut down. This is especially catastrophic for one-shot pods that may persist for hours or days, but it also applies to any shimv2 pod where Kata is configured to use VSOCK for communication. Backport of kata-containers/kata-containers#556 to kata-containers/runtime master branch. See github.com/kata-containers#2719 for details. Fixes kata-containers#2719 Signed-off-by: Evan Foster <efoster@adobe.com>
When a one-shot pod dies in CRI-O, the shimv2 process isn't killed until the pod is actually deleted, even though the VM is shut down. In this case, the shim appears to busyloop when attempting to talk to the (now dead) agent via VSOCK. To address this, we disconnect from the agent after the VM is shut down. This is especially catastrophic for one-shot pods that may persist for hours or days, but it also applies to any shimv2 pod where Kata is configured to use VSOCK for communication. Backport of kata-containers/kata-containers#556 to kata-containers/runtime master branch. See github.com/kata-containers#2719 for details. Fixes kata-containers#2719 Signed-off-by: Evan Foster <efoster@adobe.com>
Description of problem
Ran a test that created 40 CPU soaker containers on a 32/64 thread system w/192 GB RAM.
Using the
clusterbuster
tool from https://github.com/RobertKrawitz/OpenShift4-toolsclusterbuster -b 5 -p 5 -P soaker -T pod -Y -e --container-resource-request=cpu=10m -v -N 1 -d 40 -r 1 -t 30 --report --cleanup -Q --kata
which I ran a number of times, I observed that the node in question retained hundreds of containerd-shim-kata-v2 processes that were consuming a large amount of CPU. Accompanying this were wide variations in the CPU utilization reported by the containers, wide variation in the work (simple loop iterations) accomplished by the containers, and amount of time required for containers to start from first to last.
Without use of Kata, it took about 1.1 seconds between the first and last container to start, achieved 3986% CPU utilization (within .5% of max), and achieved about 613M loop iterations/sec. With Kata, the first run was very close to that (3.9 seconds for the containers to start, 3984% CPU utilization, and 603M loop iterations/second. Running this in a loop 5 times, all of these numbers got worse. By the final iteration, I observed 506% CPU utilization, 45 seconds between first and last pod (in some cases I saw much more), and about 69M loop iterations/second.
Expected result
Performance to remain consistent with multiple runs, no accumulation of processes.
Actual result
See attached:
kata-log.txt
The text was updated successfully, but these errors were encountered: