Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vm based args in spec?? #964

Open
crosbymichael opened this issue Apr 4, 2018 · 15 comments
Open

vm based args in spec?? #964

crosbymichael opened this issue Apr 4, 2018 · 15 comments

Comments

@crosbymichael
Copy link
Member

There was an existing comment in the VM PR located here that was not resolved before merge:

#949 (comment)

The overall issues is why do vm args need to be specified in the spec when the hypervisor is the one being invoked to read/process the spec.

@vbatts
Copy link
Member

vbatts commented May 23, 2018

so these args are for that container runtime instance. If the args are changed, then it's a new/different runtime instance, right?
It seems for audit and introspection you'd want to see the args used to start that VM.

@crosbymichael
Copy link
Member Author

crosbymichael commented May 24, 2018

What would be the difference between this and args used to exec runc then? Its weird.

@crosbymichael
Copy link
Member Author

@sameo Could you take a look at this?

@vbatts
Copy link
Member

vbatts commented May 24, 2018 via email

@crosbymichael
Copy link
Member Author

So these VM runtimes wrap another thing?

@sameo
Copy link

sameo commented May 24, 2018

@crosbymichael

the hypervisor is the one being invoked to read/process the spec.

The hypervisor (KVM, Xen, ESX, etc) does not read and process the spec. The spec is processed by the runtime itself, exactly like runc. The hypervisor creates and manages the VM that's going to host the container workload/process. You could think about the hypervisor as a different isolation and resource sharing API than respectively namespaces and cgroups. So intead of calling into a set of host kernel APIs, you call into an hypervisor API.

OCI VM runtimes carry a set of default hypervisor arguments (static and dynamic) for each hypervisor they support. They're different from the set of arguments you'd pass to runc as they only specify how the hypervisor should create the VM that the runtime is going to control in order to manage the container workload inside it.
Here the argument are optional because I don't think you'd want to specify them outside of tracing/debugging/auditing use cases.

Does that clarify things a little?

@egernst
Copy link

egernst commented May 24, 2018

How useful are these args given that in many cases most of the parameters are dynamic, added via QMP (in the qemu case)?

So long as this is optional, it seems reasonable to me.

@vbatts
Copy link
Member

vbatts commented Aug 29, 2018

@egernst @sameo so if these hypervisors have known flags or are flags that relate to values existing in the config (i.e. memory, cpuset shares, etc) then they would be known to the vm runtime, right?
Could this 'args' perhaps be more abstracted? like into labels or annotations?

@vbatts
Copy link
Member

vbatts commented Aug 29, 2018

Also, this resolution is needed to prep for a release

@egernst
Copy link

egernst commented Aug 29, 2018

@sameo - For me it'd be helpful to have a more specific example use-case for this field. I'll try to add this here, PTAL.

@vbatts @crosbymichael -- In the kata case, there are many items which we end up configuring on a per-node basis through a configuration.toml. Example of this is at [1].

Some potentially relevant items which could be used, and thus configured on a per container basis optionally:
-machine type
-machine accelerators
-iothreads
-memory-prealloc
-huges pages, etc.

These could be configured on a per workload basis.

[1] - https://github.com/kata-containers/runtime/blob/master/cli/config/configuration.toml.in

@egernst
Copy link

egernst commented Aug 29, 2018

@bergwolf PTAL.

@bergwolf
Copy link

@egernst If we think of highly customized guest configs for different workload/need on a per pod sandbox basis, I'm afraid there are just too many of them for each hypervisor type. E.g., the list you gave are just part of the configurations for QEMU. They do not make sense for some other hypervisors which would have a different set of configurations.

IWO, I tend to agree with @vbatts that we put them in labels or annotations. In kata, we can define and check for those labels/annotations, and override the default per node configuration with the provided ones.

@egernst
Copy link

egernst commented Sep 4, 2018

@bergwolf I agree.

@sameo
Copy link

sameo commented Sep 4, 2018

I'm sure we could put some effort into abstracting some common arguments across most hypervisors, but we would need to handle a labels based overriding mechanism anyway. This is a very powerful mechanism for customizing your virtualizer per pod/workload.
So bottom line for me: I agree with @vbatts and @bergwolf here.

@jterry75
Copy link
Contributor

We (Microsoft/hcsshim) have been pretty exclusively been using annotations to override any default behavior. But we do try and honor the spec itself if it also has fields. So for example a hypervisor container that has a Memory.Limit would be used as the hypervisor Memory.Limit as well. As you can realize from this approach however it does change the containers actual memory limit and affect its ability due to the VM itself using more memory than a true process environment. By default the Kube concept of a "runtime overhead" can help with this one but there are other examples that don't fit there. I am ok with making a common set of typed fields for any hypervisor to implement but I don't think we can ever do away with the use of annotations for customization's between all the different implementations of hypervisors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants