Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
lxd-agent: Fixes intermittent exec EOF closure when vsock listener is…
… restarted just after boot This PR switches the lxd-agent vsock listener to use the VMADDR_CID_ANY (4294967295) CID, rather than trying to ascertain the VM's local CID and listening only on that. The reason is two-fold: 1. We were seeing that sometimes the vsock.ContextID() call was returning 4294967295 just after the vsock module was loaded, but shortly afterward then started returning the correct CID assigned in QEMU. This would trigger the vsock CID change detector up to 30s later and cause the vsock listener to be restarted. Any ongoing exec operations that had started before that would be prematurely terminated. The vsock VID change detector was originally added to detect when a VM was statefully restored/migrated in such a way that its QEMU assigned CID was changed whilst the VM was running. This prevented LXD from using the lxd-agent until such time as the lxd-agent noticed its local CID had changed and restarted its listener on the new CID. 2. However it was observed during investigating this issue that if we bound the lxd-agent listener to the VMADDR_CID_ANY (4294967295) CID then this continue to work even if the VM was statefully restored using a different CID. This is because the VMADDR_CID_ANY seems to be used as a kind of wildcard CID. The vsock manpage says: Consider using VMADDR_CID_ANY when binding instead of getting the local CID with IOCTL_VM_SOCKETS_GET_LOCAL_CID. There are several special addresses: VMADDR_CID_ANY (-1U) means any address for binding; By binding to the VMADDR_CID_ANY address it also allows us to simplify the vsock listener logic and remove the vsock CID change detector entirely, neatly sidestepping the original problem. Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
- Loading branch information