Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions runtime-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,27 +12,28 @@ Presently there are `Prestart`, `Poststart` and `Poststop`.
Hooks allow one to run code before/after various lifecycle events of the container.
Hooks MUST be called in the listed order.
The state of the container is passed to the hooks over stdin, so the hooks could get the information they need to do their work.
All hooks execute in the host environment (e.g. the same namespace, cgroups, etc. that apply to the host process).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. They are forked from the runtime process so they inherit from the runtime, not all host processes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Oct 07, 2015 at 11:36:22AM -0700, Michael Crosby wrote:

+All hooks execute in the host environment (e.g. the same
namespace, cgroups, etc. that apply to the host process).

This is incorrect. They are forked from the runtime process so they
inherit from the runtime, not all host processes.

There are at least two runtime processes here. There is a process
launched with (for example) ‘runC start’, which I've been referring to
as the “host process”. That process forks and the child ends up in
the container, eventually running the config.json process binary. I'd
been referring to the latter as the “container process”. Do you
prefer alternative names for those two processes?


Hook paths are absolute and are executed from the host's filesystem.

### Pre-start

The pre-start hooks are called after the container process is spawned, but before the user supplied command is executed.
The pre-start hooks are called [after the container process is spawned, but before the user supplied command is executed](runtime.md#typical-lifecycle).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to mention container process here? These hooks are invoked once the majority of the container runtime has been setup, but before the container's init process has been spawned. We need to clearly specify what sub-section of the complete container environment is available for these hooks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, Oct 12, 2015 at 12:47:00PM -0700, Vish Kannan wrote:

-The pre-start hooks are called after the container process is spawned, but before the user supplied command is executed.
+The pre-start hooks are called after the container process is spawned, but before the user supplied command is executed.

Do we need to mention container process here? These hooks are
invoked once the majority of the container runtime has been setup,
but before the container's init process has been spawned. We need to
clearly specify what sub-section of the complete container
environment is available for these hooks.

The container is complete when the pre-start hooks run. The container
process has been spawned (it's just running runtime-supplied code).
The only change that happens after the pre-start hooks is that that
container process execs the user-supplied process, swapping in the
user-selected code. For a bit of runC-side docs on this, see the
synchronization pipe in 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are stating runC's behavior. Why do we have to require that behavior in the Spec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, Oct 12, 2015 at 01:20:20PM -0700, Vish Kannan wrote:

You are stating runC's behavior. Why do we have to require that
behavior in the Spec?

This commit is mostly about describing the runC approach so we have a
non-controversial starting point for future evolution (“is that what
runC does” is something that is harder to disagree on than “should
that be how the spec works” ;).

That being said, I do think all runtimes will need to execute
runtime-specified code in the container process 1.

They are called after the container namespaces are created on Linux, so they provide an opportunity to customize the container.
In Linux, for e.g., the network namespace could be configured in this hook.

If a hook returns a non-zero exit code, then an error including the exit code and the stderr is returned to the caller and the container is torn down.
If a hook returns a non-zero exit code, [then an error including the exit code and the stderr is returned to the caller and the container is torn down](runtime.md#typical-lifecycle).

### Post-start

The post-start hooks are called after the user process is started.
The post-start hooks are called [after the user process is started](runtime.md#typical-lifecycle).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about init process instead of user process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, Oct 12, 2015 at 12:48:09PM -0700, Vish Kannan wrote:

Post-start

-The post-start hooks are called after the user process is started.
+The post-start hooks are called after the user process is started.

How about init process instead of user process?

The container process (initially running runtime-supplied code) is
also the init process. Here's how I see the initial communication:

Host process Container process


start
launches child ->
blocks joins namespaces
unshares namespaces
configures unshared namespaces
drops permissions, changes user, ...
<- signals that container is complete
runs pre-start hooks blocks
signals hooks complete ->
closes pipe
executes user process

So there's a lot going on in the container process before it turns
itself into the “user process”.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, Oct 12, 2015 at 12:48:09PM -0700, Vish Kannan wrote:

How about init process instead of user process?

Also “init process” implies a PID namespace, and those are optional
with this spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, Oct 12, 2015 at 12:56:13PM -0700, W. Trevor King wrote:

Host process Container process


start
launches child ->
blocks joins namespaces

After launching and before blocking, the host process should also be
adding the container process to its cgroup (if you're using any
cgroups).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the runC behavior @wking.

One alternative to the runC behavior is that of unsharing namespaces (excepting pid), bind mounting the namespaces, and letting the hooks run before starting a new process in a new pid namespace (if requested by the user).
In this scenario, the container process isn't required.

Similarly, I'm wondering if the new semantics you are introducing are valid in other OSes.
I'd like to solicit feedback from the folks who are working on Spec implementations on other other OS'es like Windows and Solaris.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, Oct 12, 2015 at 03:48:06PM -0700, Vish Kannan wrote:

One alternative to the runC behavior is that of unsharing namespaces
(excepting pid), bind mounting the namespaces, and letting the hooks
run before starting a new process in a new pid namespace (if
requested by the user). In this scenario, the container process
isn't required.

In the “user requested a PID namespace” case, that sounds like “the
container isn't fully setup” (since you don't have the requested PID
namespace at pre-start time).

In either case, there is still a container process with
runtime-specified code at pre-start-hook time, because it's holding
open the unshared namespaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. If you bind-mount the ns files, you don't need a process to
hold the namespaces.

On Mon, Oct 12, 2015 at 4:02 PM, W. Trevor King notifications@github.com
wrote:

In runtime-config.md
#207 (comment):

Post-start

-The post-start hooks are called after the user process is started.
+The post-start hooks are called after the user process is started.

On Mon, Oct 12, 2015 at 03:48:06PM -0700, Vish Kannan wrote: One
alternative to the runC behavior is that of unsharing namespaces (excepting
pid), bind mounting the namespaces, and letting the hooks run before
starting a new process in a new pid namespace (if requested by the user).
In this scenario, the container process isn't required.
In the “user requested a PID namespace” case, that sounds like “the
container isn't fully setup” (since you don't have the requested PID
namespace at pre-start time). In either case, there is still a container
process with runtime-specified code at pre-start-hook time, because it's
holding open the unshared namespaces.


Reply to this email directly or view it on GitHub
https://github.com/opencontainers/specs/pull/207/files#r41810455.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, Oct 12, 2015 at 04:03:30PM -0700, Vish Kannan wrote:

Not really. If you bind-mount the ns files, you don't need a process
to hold the namespaces.

Oh, that will make my life a lot easier :). Is that:

Host process Temporary child


Launch a child process ->
Call clone(2) with CLONE_NEW*
Bind mount /proc/self/ns/* somewhere else
pivot_root, etc.
<- Exit
Use the bind-mounted files

In that case, excepting the PID namespace from the pre-start
requirements seems like a reasonable prices to pay. And it will make
explicit ‘create’ and ‘start’ commands possible 1.

On the other hand, we'll probably need to revisit the state JSON that
landed in #87 now that the ‘pid’ entry doesn't make sense for either
pre-start or post-stop.

 Message-ID: <20150915034717.GO18018@odin.tremily.us>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, Oct 12, 2015 at 04:14:38PM -0700, W. Trevor King wrote:

Mon, Oct 12, 2015 at 04:03:30PM -0700, Vish Kannan:

Not really. If you bind-mount the ns files, you don't need a process
to hold the namespaces.

Oh, that will make my life a lot easier :). Is that…

And it looks like it is :). Language supporting this in namespaces(7):

Bind mounting (see mount(2)) one of the files in this directory to
somewhere else in the filesystem keeps the corresponding namespace
of the process specified by pid alive even if all processes
currently in the namespace terminate.

For example:

tty1$ unshare --user --uts --map-root-user
tty1$ hostname alice
tty1$ echo $$
25477

tty2# touch /tmp/uts-alice
tty2# mount --bind /proc/25477/ns/uts /tmp/uts-alice

tty1$ exit

tty2# ps aux | grep 25477 | grep -v grep
… no hits …
tty2# nsenter --uts=/tmp/uts-alice hostname
alice

For example this hook can notify user that real process is spawned.

If a hook returns a non-zero exit code, then an error is logged and the remaining hooks are executed.

### Post-stop

The post-stop hooks are called after the container process is stopped.
The post-stop hooks are called [after the container process is stopped](runtime.md#typical-lifecycle).
Cleanup or debugging could be performed in such a hook.
If a hook returns a non-zero exit code, then an error is logged and the remaining hooks are executed.

Expand Down
54 changes: 45 additions & 9 deletions runtime.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,24 +29,60 @@ The root directory to the bundle is provided in the state so that consumers can
}
```

## Lifecycle
## Typical lifecycle

A typical lifecyle progresses like this:

1. There is no container
2. A user tells the runtime to start a container and launch a process inside it
3. The runtime [creates the container](#create)
4. The runtime executes any [pre-start hooks](runtime-config.md#pre-start)
5. The runtime [executes the container process](#start-process)
6. The container process is running
7. The runtime executes any [post-start hooks](runtime-config.md#post-start)
8. A user tells the runtime to send a termination signal to the container process
9. The runtime [sends a termination signal to the container process](#stop-process)
10. The container process exits
11. The runtime [terminates any other processes in the container](#stop-process)
12. The runtime executes any [post-stop hooks](runtime-config.md#post-stop)
13. The runtime [removes the container](#cleanup)

With steps 7 and 8, the user is explicitly stopping the container process (via the runtime), but it's also possible that the container process could exit for other reasons.
In that case we skip directly from 6 to [10](#stop-process), skipping any post-start hooks that hadn't been launched and terminating any in-progress post-start hook.

Failure in a pre-start hook or other setup task can cause a jump straight to [12](runtime-config.md#post-stop).

### Create

Creates the container: file system, namespaces, cgroups, capabilities.
Create the container: file system, namespaces, cgroups, capabilities, etc.
The invoked process forks, with one branch that stays in the host namespace and another that enters the container.
The host process carries out all container setup actions, and continues running for the life of the container so it can perform teardown after the container process exits.
The container process changes users and drops privileges in preparation for the container process start.
At this point, the host process writes the [`state.json`](#state) file with the host-side version of the container-process's PID (the container process may be in a PID namespace).

### Start (process)

Runs a process in a container.
Can be invoked several times.
After the pre-start hooks complete, the host process signals the container process to execute the runtime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence does not make sense. the host process signals the container process to execute the runtime.????

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Oct 07, 2015 at 11:40:34AM -0700, Michael Crosby wrote:

+After the pre-start hooks complete, the host process signals the
container process to execute the runtime.

This sentence does not make sense. the host process signals the container process to execute the runtime.????

I haven't looked into the runC implementation here, but my guess was
that:

  1. ‘runC start’ would launch the host process.
  2. The host process would create as much of the container as it could
    without a container process to hold open things like a PID
    namespace.
  3. The host process would fork, and the child of that fork would be
    the container process.
  4. The container process would join the existing (partial container)
    and finish it off (e.g. by creating a PID namespace, changing
    users, dropping priveledges, etc.)
  5. The host process would launch the pre-start hooks.
  6. The host process would signal the container process that the
    pre-start hooks had completed successfully.
  7. The container process would catch that signal and exec the
    config.json process.

The line you're quoting was supposed to sketch out (6) and (7). Can
you clear me up on how it actually works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Oct 07, 2015 at 11:51:50AM -0700, W. Trevor King wrote:

Wed, Oct 07, 2015 at 11:40:34AM -0700, Michael Crosby:

+After the pre-start hooks complete, the host process signals
the container process to execute the runtime.

This sentence does not make sense. the host process signals the container process to execute the runtime.????

I haven't looked into the runC implementation here, but my guess was
that:


6. The host process would signal the container process that the
pre-start hooks had completed successfully.
7. The container process would catch that signal and exec the
config.json process.

The line you're quoting was supposed to sketch out (6) and (7). Can
you clear me up on how it actually works?

It looks like libcontainer actually has a fairly detailed spec for how
it sets up containers. That talks about using a pipe instead of
singals for this trigger 1. Signals sound easier to me (permissions
discussed in 2), but the generic “signals” verb could cover all such
triggers (including piped messages) and not be restricted to POSIX
signals 3, since this is an internal runtime mechanism.

The runtime execs the process defined in `config.json`'s [**`process`** attribute](config.md#process-configuration).
On Linux hosts, some information for this execution may come from outside the `config.json` and `runtime.json` specifications.
See the [Linux-specific notes for details](runtime-linux.md#file-descriptors).

### Stop (process)

Not sure we need that from runc cli.
Process is killed from the outside.
Send a termination signal to the container process (can optionally send other signals to the container process, e.g. a kill signal).
When the process exits, the host process collects it's exit status to return as its own exit status.
If there are any remaining processes in the container's cgroup (and [we only support unified-hierarchies](runtime-config-linux.md#control-groups)), the host process kills and reaps them.

### Cleanup

The host process removes the [`state.json`](#state) file and the container: unmounting file systems, removing namespaces, etc.
This is the inverse of create.
The host process then exits with the container processes's exit status.

This event needs to be captured by runc to run onstop event handlers.
## Joining existing containers

## Hooks
Joining an existing container looks just like the usual workflow, except that the container process [joins the target container](runtime-config-linux.md#control-groups) at the beginning of step 3.
It can then, depending on its configuration, continue to create an additional child cgroup underneath the one it joined.

See [runtime configuration for hooks](./runtime-config.md)
When exiting, the reaping logic in the [stop phase](#stop-process) is the same.
If the container process created a child cgroup, all other processes in that child cgroup are reaped, but no other processes in the joined cgroup (which the container process did not create) are reaped.