Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error at removeContainer request #1972

Closed
snir911 opened this issue Aug 1, 2024 · 4 comments
Closed

Error at removeContainer request #1972

snir911 opened this issue Aug 1, 2024 · 4 comments

Comments

@snir911
Copy link
Contributor

snir911 commented Aug 1, 2024

When deleting a running peer-pod which is based on recent code removeContainer results in panic.
Verified on AWS and Libvirt

How to reproduce:

Libvirt

  1. Follow the libvirt instructions for creating cluster and deploy operator
  2. use qcow image from quay.io/confidential-containers/podvm-binaries-ubuntu-amd64:v0.9.0-alpha.4)
  3. run some pod until running
  4. delete it and check CAA DS logs

AWS

  1. Deploy EKS cluster (or k8s cluster on AWS)
  2. Install using the operator (v0.9.0)
  3. create any recent image (e.g. upload the quay.io/confidential-containers/podvm-binaries-ubuntu-amd64:v0.9.0-alpha.4 qcow as ami) and set as the PODVM_AMI_ID
  4. run some peer-pod and make sure it's in running state
  5. delete the pod and check the CAA DS logs

output

full log

...

2024/08/01 12:18:51 [adaptor/proxy] RemoveContainer: containerID:1e799ed9b8ab0e9170d60f2015a569a7bd1303179cc66462a93654890f9f6539
2024/08/01 12:18:51 [adaptor/proxy] RemoveContainer: containerID:ebf5f9cc9106f660956c862bce5ba88468150045da2b82ec9cfe28fd9a92fba0
2024/08/01 12:18:51 [adaptor/proxy] RemoveContainer fails: rpc error: code = Internal desc = EINVAL: Invalid argument

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
   1: <rustjail::container::LinuxContainer as rustjail::container::BaseContainer>::destroy::{{closure}}
   2: <kata_agent::rpc::AgentService as protocols::agent_ttrpc_async::AgentService>::remove_container::{{closure}}
   3: <protocols::agent_ttrpc_async::RemoveContainerMethod as ttrpc::asynchronous::utils::MethodHandler>::handler::{{closure}}
   4: ttrpc::asynchronous::server::HandlerContext::handle_msg::{{closure}}
   5: <ttrpc::asynchronous::server::ServerReader as ttrpc::asynchronous::connection::ReaderDelegate>::handle_msg::{{closure}}::{{closure}}
   6: tokio::runtime::task::raw::poll
[loglibvirt.txt](https://github.com/user-attachments/files/16456714/loglibvirt.txt)

   7: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   8: tokio::runtime::task::raw::poll
   9: std::sys_common::backtrace::__rust_begin_short_backtrace
  10: core::ops::function::FnOnce::call_once{{vtable.shim}}
  11: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at ./rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/boxed.rs:2007:9
  12: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at ./rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/boxed.rs:2007:9
  13: std::sys::unix::thread::Thread::new::thread_start
             at ./rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys/unix/thread.rs:108:17
  14: start_thread
  15: clone
2024/08/01 12:18:51 [adaptor/proxy] DestroySandbox
2024/08/01 12:18:51 [adaptor/proxy] DestroySandbox fails: rpc error: code = Internal desc = EINVAL: Invalid argument

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
   1: <rustjail::container::LinuxContainer as rustjail::container::BaseContainer>::destroy::{{closure}}
   2: <kata_agent::rpc::AgentService as protocols::agent_ttrpc_async::AgentService>::destroy_sandbox::{{closure}}
   3: <protocols::agent_ttrpc_async::DestroySandboxMethod as ttrpc::asynchronous::utils::MethodHandler>::handler::{{closure}}
   4: ttrpc::asynchronous::server::HandlerContext::handle_msg::{{closure}}
   5: <ttrpc::asynchronous::server::ServerReader as ttrpc::asynchronous::connection::ReaderDelegate>::handle_msg::{{closure}}::{{closure}}
   6: tokio::runtime::task::raw::poll
   7: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   8: tokio::runtime::task::raw::poll
   9: std::sys_common::backtrace::__rust_begin_short_backtrace
  10: core::ops::function::FnOnce::call_once{{vtable.shim}}
  11: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at ./rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/boxed.rs:2007:9
  12: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at ./rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/boxed.rs:2007:9
  13: std::sys::unix::thread::Thread::new::thread_start
             at ./rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys/unix/thread.rs:108:17
  14: start_thread
  15: clone
2024/08/01 12:18:51 [adaptor/proxy] shutting down socket forwarder
2024/08/01 12:18:51 [adaptor/cloud/aws] Deleting instance (i-0fed2bc72e5876955)
2024/08/01 12:18:51 [adaptor/cloud/aws] deleted an instance i-0fed2bc72e5876955
2024/08/01 12:18:51 [util/k8sops] sleep's owned PeerPod object can now be deleted
2024/08/01 12:18:51 [tunneler/vxlan] Delete tc redirect filters on eth0 and ens5 in the network namespace /var/run/netns/cni-019d1e30-643d-d465-e068-c97311c08230
2024/08/01 12:18:51 [tunneler/vxlan] Delete vxlan interface vxlan1 in the network namespace /var/run/netns/cni-019d1e30-643d-d465-e068-c97311c08230
@snir911 snir911 changed the title panic at removeContainer request Error at removeContainer request Aug 6, 2024
@stevenhorsman
Copy link
Member

Hey @snir911 - I think I might have a recollection of seeing something similar before, but I've just tried this with the latest version on libvirt and can't reproduce:

2024/10/07 13:46:48 [adaptor/proxy] StartContainer: containerID:97f7f5d8fc77d6a48f1e2c33748ac2c3c142012252042d8512092e54e2b31ffe
2024/10/07 13:47:14 [adaptor/proxy] RemoveContainer: containerID:97f7f5d8fc77d6a48f1e2c33748ac2c3c142012252042d8512092e54e2b31ffe
2024/10/07 13:47:14 [adaptor/proxy] RemoveContainer: containerID:ef68d12877fb1712f3222dc2454d7477105c4c191bc9a1b725253b53feab6f02
2024/10/07 13:47:14 [adaptor/proxy] DestroySandbox
2024/10/07 13:47:14 [adaptor/proxy] shutting down socket forwarder
2024/10/07 13:47:14 [adaptor/cloud/libvirt] Deleting instance (3)
2024/10/07 13:47:14 [adaptor/cloud/libvirt] Checking if instance (3) exists
2024/10/07 13:47:14 [adaptor/cloud/libvirt] domainDef [{{ disk} disk     0xc00025e900 <nil> 0xc00088c990 <nil> <nil> <nil> <nil> <nil> 0xc000b439a0 <nil> <nil> <nil> <nil>     <nil> 0xc000b809a8 <nil> <nil> 0xc000797620} {{ disk} disk     0xc00025eb40 <nil> 0xc00088ca20 <nil> <nil> <nil> <nil> <nil> 0xc000b439f0 <nil> <nil> <nil> <nil>     <nil> <nil> <nil> <nil> 0xc000797800}]
2024/10/07 13:47:14 [adaptor/cloud/libvirt] Check if podvm-nginx-75d4ffc6d9-7k2hb-ef68d128-root.qcow2 volume exists
2024/10/07 13:47:14 [adaptor/cloud/libvirt] Deleting volume podvm-nginx-75d4ffc6d9-7k2hb-ef68d128-root.qcow2
2024/10/07 13:47:14 [adaptor/cloud/libvirt] Check if podvm-nginx-75d4ffc6d9-7k2hb-ef68d128-cloudinit.iso volume exists
2024/10/07 13:47:14 [adaptor/cloud/libvirt] Deleting volume podvm-nginx-75d4ffc6d9-7k2hb-ef68d128-cloudinit.iso
2024/10/07 13:47:14 [adaptor/cloud/libvirt] deleted an instance 3
2024/10/07 13:47:14 [util/k8sops] nginx-75d4ffc6d9-7k2hb's owned PeerPod object can now be deleted
2024/10/07 13:47:14 [tunneler/vxlan] Delete tc redirect filters on eth0 and enc1 in the network namespace /var/run/netns/cni-1888209c-567a-5990-faa1-0fbdb5e2d4a2
2024/10/07 13:47:14 [tunneler/vxlan] Delete vxlan interface vxlan1 in the network namespace /var/run/netns/cni-1888209c-567a-5990-faa1-0fbdb5e2d4a2

It sounds like Beraldo isn't seeing it on Azure either, so do you want to try re-producing with AWS again, or do we think this has been fixed now?

@EmmEff
Copy link
Contributor

EmmEff commented Oct 7, 2024

FWIW, I am not seeing it on AWS now either

@stevenhorsman
Copy link
Member

It sounds like we can close this as fixed/can't reproduce then. Are you happy with that @snir911 ?

@snir911
Copy link
Contributor Author

snir911 commented Oct 13, 2024

@stevenhorsman thanks, it was indeed reproducible on Libvirt as well, hence, it might have been fixed, I'm closing this issue, I'll also try again locally.

@snir911 snir911 closed this as completed Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants