-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Experiment] Explore running multiple containers in a shared VM #3658
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #3658 +/- ##
=======================================
Coverage 19.86% 19.87%
=======================================
Files 231 231
Lines 51063 51160 +97
=======================================
+ Hits 10143 10167 +24
- Misses 40179 40253 +74
+ Partials 741 740 -1 ☔ View full report in Codecov by Sentry. |
How did you manage this? I see you switched from an assumption that What I don't get is:
|
Separately, what is the effort of this vs seeing if kata or k3s would be less effort? |
@deitch from what I saw kata supports KVM, but I didn't see other type 1 hypervisor (like Xen) but even if we do thing that is only supported on KVM, kata creates VM instance with which it communicates via virtio. So we will have to have at least one VM to run containers. Edit: there's also microvms in KVM, which might reduce footprint |
mount -t tmpfs -o nodev,nosuid,noexec,size=20% shm "$MNT"/rootfs/dev/shm | ||
mount -t tmpfs -o nodev,nosuid,size=20% tmp "$MNT"/rootfs/tmp | ||
mount -t mqueue -o nodev,nosuid,noexec none "$MNT"/rootfs/dev/mqueue | ||
ln -s /proc/self/fd "$MNT"/rootfs/dev/fd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sharing all these descriptors with all containers wouldn't mess things up? Without a mux we will have mixed outputs on stdout, which might not be critical for now, but what about the stdin? Do we care about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case for this experiment is when some sharing is ok, but I'll look at the list and see what makes sense to separate.
If the direction in this PR is useful (with its limitations) it might be a useful stepping stone to running a collection of containers (a pod) in a VM using something existing like kata or k3s.
If we have no choice, then ok. I just am so wary of yet again creating something that looks a lot like some other OSS project or library, but we do just a little bit differently. |
The current API allows specifying any number of virtual disks, whether they are OCI or images. |
That is how you did it. Now I see it inside. That is rather nicely done. It feels a bit swimming against the stream - Kubernetes has a native "multiple containers together (i.e. Pod)" concept - but we do as we need. |
From an implementation perspective I expect this to go away (together with the rest of the init-initrd scripts in pkg/xen-tools) once we find and integrate a standard runtime for all of this. And that should presumably give us the ability to specify e.g., volume and network resources for the containers inside the pod VM. So a stepping stone from a functional perspective, and a limited amount of throw-away code. |
Signed-off-by: eriknordmark <erik@zededa.com>
Add support for /mnt%d per OCI Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
b27c041
to
6b09142
Compare
I have some security concerns about running multiple apps in one VM without mapping each container to a separate user, and the fact that we are not setting up namespaces like mount and other. I'll write a longer comment soon. |
@shjala |
For sidecar containers it would be useful to be able to run them in the same VM as the main container.
This is an experiment to see whether that can be done without any API changes by looking for multiple OCI-based volumes for a single app instance, and kicking off the EntryPoint for each one of them.
If this works it might be a useful stepping stone to get to more complete standard runtime for multiple containers in one VM.
With this PR I can create an app instance which has two OCI images (by example was the unmodified nginx and sshd containers from docker.io). The sshd and nginx run with chroot isolation and otherwise share everything, which matches the intended use case of closely cooperating and trust between a side car container and the main container.
Note that the commits in this PR needs to be cleaned up and squashed.