Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand on the definition of our ops #225

Merged
merged 1 commit into from
Feb 22, 2016
Merged

Conversation

duglin
Copy link
Contributor

@duglin duglin commented Oct 13, 2015

There are a few "OPEN" question in the doc that I'd like to brainstorm on.

Signed-off-by: Doug Davis dug@us.ibm.com

OCI compliant runtimes MUST support the following operations, unless support for the operation can not be supported by the base operating system (kernel) itself.

OPEN: we should consider specifying the options/args for each operation.
For example, checkpoint (for runc) seems to have a lot of params.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a mailing list thread for this command-line specification, and initial work in this Gist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking can you remove the comments that aren't asking for a specific change? Not sure how many others are like me, but I like to have a clean PR (no outstanding comments that need resolving) before they are merged, and if a PR has a lot of comments that are just "FYI" in nature then its hard to pick out the ones that need action from the ones we can ignore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, Jan 04, 2016 at 10:44:32AM -0800, Doug Davis wrote:

@wking can you remove the comments that aren't asking for a specific
change?

Will do. Now that we have a tagged set of open list issues [1,2](including the command-line API [3]), I think there's less need for
either my FYI comments or entries like this. So my actionable
suggestion here is “drop this line in favor of 1 for listing open
high-level discussion”.

@wking
Copy link
Contributor

wking commented Oct 13, 2015

On Tue, Oct 13, 2015 at 10:34:04AM -0700, Doug Davis wrote:

There are a few "OPEN" question in the doc that I'd like to
brainstorm on.

I think this belongs in the lifecycle thread [1,2], but I'm in favor
of anything that ends up in some kind of lifecycle docs landing :). I
left a few comments inline with more detailed suggestions.

@crosbymichael
Copy link
Member

@duglin do you think adding things like pause/resume and checkpoint/restore would hurt other platforms that do not have this type of feature set? Maybe these are somethings that we can leave out for broader adoption and spec out the main usecase of create, start, stop, delete?

@duglin
Copy link
Contributor Author

duglin commented Oct 14, 2015

@crosbymichael that's why I worded the intro paragraph the way I did. I tried to make it say that these ops are only required to be supported if the platform itself supports it. Perhaps some better wording is needed, but what do you think about the general idea?

@crosbymichael
Copy link
Member

I think it's good overall. Just don't want to add wording that could prohibit runtimes being compliant for some of these "extra" features.

## Lifecycle
## Operations

OCI compliant runtimes MUST support the following operations, unless support for the operation can not be supported by the base operating system (kernel) itself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe simplify a bit to "unless the operation is not supported by the base operating system" ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Oct 14, 2015 at 10:55:52AM -0700, Mrunal Patel wrote:

+OCI compliant runtimes MUST support the following operations,
unless support for the operation can not be supported by the base
operating system (kernel) itself.

Maybe simplify a bit to "unless the operation is not supported by
the base operating system" ?

+1 to Mrunal's wording, but I'd switch “base” → “host” as well.


This operation MUST stop a running container.
Stopping a container MUST stop all of the processes running within the scope of the container.
Stopping a container MUST NOT delete the associated namespaces or cgroups for the container.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not say this today because its the opposite of how it actually works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think depends on whether paths were passed for the namespaces/cgroups or the runtime created those for the container.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Nov 17, 2015 at 04:38:22PM -0800, Mrunal Patel wrote:

+Stopping a container MUST stop all of the processes running within the scope of the container.
+Stopping a container MUST NOT delete the associated namespaces or cgroups for the container.

I think depends on whether paths were passed for the
namespaces/cgroups or the runtime created those for the container.

+1. We should only be removing namespaces and cgroups (and killing
any processes that are still in them or their ancestors) if we're
stopping/deleting the container that created them. I think everyone's
on board with that (although it's not how the “delete” phrasing
currently reads here).

However, this PR is splitting runC's current cleanup into distinct
“stop” and “delete” actions, and I think that's what @crosbymichael is
objecting to (because runC doesn't currently have a way to
stop-but-not-delete a container).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crosbymichael ignoring for a moment what the code does today... should stopping a container delete the namespaces and cgroups? Shouldn't delete and stop be distinct actions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its kinda a hard question because it goes into how you define a container. Technically I think its a bad idea to keep namespaces around because its buggy and racy just to make it fit someones made up definition of a container. We would be jumping through hoops in code and on the system for no real use case other than making the technical reality match something that one of us typed.

We dont keep around all the memory of a VM when its stopped just because we made a UI with a stop and delete button so I dont think it makes sense to do this.

Just because we want a stop and delete phase does not mean we need to go looking for something that we can keep laying around to clean up later just to support the thought. Maybe we don't need a stop and delete at all so maybe we should discuss some reasons why people think it should be added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still coming up to speed on the details of this but let's say I have a container that gets stopped and then started again later. If we delete the namespaces/cgroup during the stop, is there anything that can not be reliably recreated upon the next time the container is started? Any worry about some other namespace/cgroup grabbing something that we'd like to hold on to? If not, then I think you're probably right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you described is how it works today and this has never come up as an issue except in "wouldn't it be cool if..." discussions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, are you suggesting we go all the way and just merge delete() and stop()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for the looks of what delete is defined as right now. I dont see any advantages right now from having it other than more operations and more complexity for implementer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, Jan 04, 2016 at 01:42:12PM -0800, Michael Crosby wrote:

What you described is how it works today and this has never come up
as an issue except in "wouldn't it be cool if..." discussions.

“Kill everything in the container, removing its namespaces and
cgroups” sounds like “delete” to me, so I don't see much reason to
distinguish between “stop” and “delete” if that's also what “stop”
means.

For a situation in which preserving namespaces is how it works today,
see my create/exec/destroy example [1,2]. One place where this moves
beyond “wouldn't it be cool if…” is that a spec based on this kind of
create/exec/delete separation lets us get out of the hook business and
its ordering issues 3; folks can use a shell script (or whatever)
for both the runtime actions and the currently-hook actions.

@crosbymichael
Copy link
Member

@duglin overall I think it looks good. I would still suggest removing pause/resume and checkpoint/restore until we get more information about other platforms and how they are going to handle some of the actions. It's always easier to add more later than take somethings away.

Attempting to start an already running container MUST have no effect on the container and MUST generate an error.
Starting a stopped container, one that had been previously be running, MUST create a new process, as specified by the `config.json` file, within the scope of the container.

OPEN: the entire "generate an error" stuff in here is something we should discuss.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generating an error makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Nov 17, 2015 at 04:21:56PM -0800, Mrunal Patel wrote:

+OPEN: the entire "generate an error" stuff in here is something we should discuss.

Generating an error makes sense.

I think we all agree that we want error reporting, but we don't have a
clear picture for how to report errors to the caller. With both the
standard streams and error code already assigned to the container
process, what channel is the runtime using to report internal errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I purposely used the word "generate" instead of "report" or something else that implies the end user sees it. I think that gets into implementation details. For example, some impl might just log everything and their "user" never sees any of it and I think that's valid from this specs perspective. The point is that we all agree its an error condition and the action did not happen but how the impl deal with the error is up to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, Jan 04, 2016 at 10:49:48AM -0800, Doug Davis wrote:

+OPEN: the entire "generate an error" stuff in here is something we should discuss.

I purposely used the word "generate" instead of "report" or
something else that implies the end user sees it. I think that gets
into implementation details. For example, some impl might just log
everything and their "user" never sees any of it and I think that's
valid from this specs perspective. The point is that we all agree
its an error condition and the action did not happen but how the
impl deal with the error is up to it.

That sounds like a useful distinction to me, and I think its worth
spelling it in this PR. How about adding an error-handling section
somewhere (the glossary? #107) that translates your generate
vs. report disctinction into spec language, and then just use “MUST
generate an error” like you already do in this section.

Once we do that, we can drop the OPEN issue here, and open a mailing
list thread if/when we want something in the spec about the reporting
aspect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up will add a section that talks about generating errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, Jan 04, 2016 at 12:12:49PM -0800, Doug Davis wrote:

+OPEN: the entire "generate an error" stuff in here is something we should discuss.

up will add a section that talks about generating errors.

The “Errors” section that landed with b6698667a5b1f9 looks good to me.

@duglin duglin force-pushed the RuntimeOps branch 3 times, most recently from f74c3ca to 7a5b1f9 Compare January 4, 2016 20:37
@@ -54,6 +54,76 @@ The lifecycle describes the timeline of events that happen from when a container

Note: The lifecycle is a WIP and it will evolve as we have more use cases and more information on the viability of a separate create phase.

## Operations

OCI compliant runtimes MUST support the following operations, unless support for the operation can not be supported by the base operating system (kernel) itself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think @mrunalp's earlier suggestion for simplifying this sentence would be a good change.

@duglin duglin force-pushed the RuntimeOps branch 3 times, most recently from 6393659 to 920757b Compare January 5, 2016 18:01
@duglin
Copy link
Contributor Author

duglin commented Jan 5, 2016

Updated. Merged create/start and stop/delete.
A couple of other questions/comments for folks to weigh in on too.

This operation MUST create a new process within the scope of the container.
If the container is not running then this operation MUST have no effect on the container and MUST generate an error.
Executing this operation multiple times MUST result in a new process each time.
Unlike the container's main process which is specified in the `config.json` file, this specification does not define how the `exec` process is defined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: trailing whitespace.

And I don't see any point to declaring an action but not specifying how to configure/trigger it. How would you test runtimes for conformance with this action? What do bundle-authors or runtime-callers do to use this feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add something about exec's config file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Jan 05, 2016 at 02:12:40PM -0800, Doug Davis wrote:

+Unlike the container's main process which is specified in the config.json file, this specification does not define how the exec process is defined.

I'll add something about exec's config file.

To get this PR landed faster, I suggest pulling the exec action out
into a follow-up PR. But maybe everyone else thinks we're closer to
an exec-spec consensus than I do ;).

Mashimiao pushed a commit to Mashimiao/specs that referenced this pull request Aug 19, 2016
This slipped through the renumbering in 7117ede (Expand on the
definition of our ops, 2015-10-13, opencontainers#225).

Signed-off-by: W. Trevor King <wking@tremily.us>
Mashimiao pushed a commit to Mashimiao/specs that referenced this pull request Aug 19, 2016
# digest/hashing target

Most of this has spun off with [1], and I haven't heard of anyone
talking about verifying the on-disk filesystem in a while.  My
personal take is on-disk verification doesn't add much over serialized
verification unless you have a local attacker (or unreliable disk),
and you'll need some careful threat modeling if you want to do
anything productive about the local attacker case.  For some more
on-disk verification discussion, see the thread starting with [2].

# distributable-format target

This spun off with [1].

# lifecycle target

I think this is resolved since 7713efc (Add lifecycle for containers,
2015-10-22, opencontainers#231), which was committed on the same day as the ROADMAP
entry (4859f6d, Add initial roadmap, 2015-10-22, opencontainers#230).

# container-action target

Addressed by 7117ede (Expand on the definition of our ops,
2015-10-13, opencontainers#225), although there has been additional discussion in
a7a366b (Remove exec from required runtime functionalities,
2016-04-19, opencontainers#388) and 0430aaf (Split create and start, 2016-04-01,
opencontainers#384).

# validation and testing targets

Validation is partly covered by cdcabde (schema: JSON Schema and
validator for `config.json`, 2016-01-19, opencontainers#313) and subequent JSON
Schema work.  The remainder of these targets are handled by ocitools
[3].

# printable/compiled-spec target

The bulk of this was addressed by 4ee036f (*: printable documents,
2015-12-09, opencontainers#263).  Any remaining polishing of that workflow seems
like a GitHub-issue thing and not a ROADMAP thing.  And publishing
these to opencontainers.org certainly seems like it's outside the
scope of this repository (although I think that such publishing is a
good idea).

[1]: https://github.com/opencontainers/image-spec
[2]: https://groups.google.com/a/opencontainers.org/d/msg/dev/xo4SQ92aWJ8/NHpSQ19KCAAJ
     Subject: OCI Bundle Digests Summary
     Date: Wed, 14 Oct 2015 17:09:15 +0000
     Message-ID: <CAD2oYtN-9yLLhG_STO3F1h58Bn5QovK+u3wOBa=t+7TQi-hP1Q@mail.gmail.com>
[3]: https://github.com/opencontainers/ocitools

Signed-off-by: W. Trevor King <wking@tremily.us>
Mashimiao pushed a commit to Mashimiao/specs that referenced this pull request Aug 19, 2016
This wording is descended from 7117ede (Expand on the definition of
our ops, 2015-10-13, opencontainers#225), but the idea is covered generically by
e53a72b (Clarify the operation is not for command-line api,
2016-05-24, opencontainers#450), so we no longer need a create-specific note.
Especially in the lifecycle docs, where there's already enough going
on without this low-level detail.

Signed-off-by: W. Trevor King <wking@tremily.us>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Sep 8, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also drops the file-descriptor docs from runtime-linux.
It's unclear how these apply to runtimes APIs that are not based on
the command line / execve, and the functionality is covered by the
more tightly scoped LISTEN_FDS wording in the command-line docs.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 13, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  recieved and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also drops the file-descriptor docs from runtime-linux.
It's unclear how these apply to runtimes APIs that are not based on
the command line / execve, and the functionality is covered by the
more tightly scoped LISTEN_FDS wording in the command-line docs.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 13, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also drops the file-descriptor docs from runtime-linux.
It's unclear how these apply to runtimes APIs that are not based on
the command line / execve, and the functionality is covered by the
more tightly scoped LISTEN_FDS wording in the command-line docs.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 13, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also drops the file-descriptor docs from runtime-linux.
It's unclear how these apply to runtimes APIs that are not based on
the command line / execve, and the functionality is covered by the
more tightly scoped LISTEN_FDS wording in the command-line docs.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 13, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also drops the file-descriptor docs from runtime-linux.
It's unclear how these apply to runtimes APIs that are not based on
the command line / execve, and the functionality is covered by the
more tightly scoped LISTEN_FDS wording in the command-line docs.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 13, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also drops the file-descriptor docs from runtime-linux.
It's unclear how these apply to runtimes APIs that are not based on
the command line / execve, and the functionality is covered by the
more tightly scoped LISTEN_FDS wording in the command-line docs.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 14, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also drops the file-descriptor docs from runtime-linux.
It's unclear how these apply to runtimes APIs that are not based on
the command line / execve, and the functionality is covered by the
more tightly scoped LISTEN_FDS wording in the command-line docs.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 14, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primatives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 14, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 20, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 22, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 22, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 25, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 25, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 25, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 25, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 26, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 26, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Oct 26, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Nov 1, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Versioning

The command-line interface is largely orthogonal to the config format,
and config authors and runtime callers may be entirely different sets
of people.  Zhang Wei called for more explicit versioning for the CLI
[interface-versioning], and the approach taken here follows the
approach taken by Python's email package [python-email-version].

Wedging multiple, independently versioned entities into a single
repository can be awkward, but earlier proposals to put the CLI in its
own repository [separate-repository-proposed] were unsuccessful
because compliance testing requires both a CLI and a config
specification [separate-repository-refused].  Trevor doesn't think
that's a solid reason [separate-repository-refusal-rebutted], but
discussion along that line stalled out, so the approach taken here is
to keep both independently versioned entities in the same repository.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[interface-versioning]: opencontainers#513 (comment)
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[python-email-version]: https://docs.python.org/3/library/email.html#package-history
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[separate-repository-proposed]: opencontainers#513 (comment)
[separate-repository-refused]: opencontainers#513 (comment)
[separate-repository-refusal-rebutted]: opencontainers#513 (comment)
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Nov 28, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Versioning

The command-line interface is largely orthogonal to the config format,
and config authors and runtime callers may be entirely different sets
of people.  Zhang Wei called for more explicit versioning for the CLI
[interface-versioning], and the approach taken here follows the
approach taken by Python's email package [python-email-version].

Wedging multiple, independently versioned entities into a single
repository can be awkward, but earlier proposals to put the CLI in its
own repository [separate-repository-proposed] were unsuccessful
because compliance testing requires both a CLI and a config
specification [separate-repository-refused].  Trevor doesn't think
that's a solid reason [separate-repository-refusal-rebutted], but
discussion along that line stalled out, so the approach taken here is
to keep both independently versioned entities in the same repository.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[interface-versioning]: opencontainers#513 (comment)
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[python-email-version]: https://docs.python.org/3/library/email.html#package-history
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[separate-repository-proposed]: opencontainers#513 (comment)
[separate-repository-refused]: opencontainers#513 (comment)
[separate-repository-refusal-rebutted]: opencontainers#513 (comment)
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Nov 28, 2016
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Versioning

The command-line interface is largely orthogonal to the config format,
and config authors and runtime callers may be entirely different sets
of people.  Zhang Wei called for more explicit versioning for the CLI
[interface-versioning], and the approach taken here follows the
approach taken by Python's email package [python-email-version].

Wedging multiple, independently versioned entities into a single
repository can be awkward, but earlier proposals to put the CLI in its
own repository [separate-repository-proposed] were unsuccessful
because compliance testing requires both a CLI and a config
specification [separate-repository-refused].  Trevor doesn't think
that's a solid reason [separate-repository-refusal-rebutted], but
discussion along that line stalled out, so the approach taken here is
to keep both independently versioned entities in the same repository.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[interface-versioning]: opencontainers#513 (comment)
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[python-email-version]: https://docs.python.org/3/library/email.html#package-history
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[separate-repository-proposed]: opencontainers#513 (comment)
[separate-repository-refused]: opencontainers#513 (comment)
[separate-repository-refusal-rebutted]: opencontainers#513 (comment)
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Feb 1, 2017
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Versioning

The command-line interface is largely orthogonal to the config format,
and config authors and runtime callers may be entirely different sets
of people.  Zhang Wei called for more explicit versioning for the CLI
[interface-versioning], and the approach taken here follows the
approach taken by Python's email package [python-email-version].

Wedging multiple, independently versioned entities into a single
repository can be awkward, but earlier proposals to put the CLI in its
own repository [separate-repository-proposed] were unsuccessful
because compliance testing requires both a CLI and a config
specification [separate-repository-refused].  Trevor doesn't think
that's a solid reason [separate-repository-refusal-rebutted], but
discussion along that line stalled out, so the approach taken here is
to keep both independently versioned entities in the same repository.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[interface-versioning]: opencontainers#513 (comment)
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[python-email-version]: https://docs.python.org/3/library/email.html#package-history
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[separate-repository-proposed]: opencontainers#513 (comment)
[separate-repository-refused]: opencontainers#513 (comment)
[separate-repository-refusal-rebutted]: opencontainers#513 (comment)
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking pushed a commit to wking/opencontainer-runtime-spec that referenced this pull request Feb 8, 2017
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Versioning

The command-line interface is largely orthogonal to the config format,
and config authors and runtime callers may be entirely different sets
of people.  Zhang Wei called for more explicit versioning for the CLI
[interface-versioning], and the approach taken here follows the
approach taken by Python's email package [python-email-version].

Wedging multiple, independently versioned entities into a single
repository can be awkward, but earlier proposals to put the CLI in its
own repository [separate-repository-proposed] were unsuccessful
because compliance testing requires both a CLI and a config
specification [separate-repository-refused].  Trevor doesn't think
that's a solid reason [separate-repository-refusal-rebutted], but
discussion along that line stalled out, so the approach taken here is
to keep both independently versioned entities in the same repository.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[interface-versioning]: opencontainers#513 (comment)
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[python-email-version]: https://docs.python.org/3/library/email.html#package-history
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[separate-repository-proposed]: opencontainers#513 (comment)
[separate-repository-refused]: opencontainers#513 (comment)
[separate-repository-refusal-rebutted]: opencontainers#513 (comment)
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Mar 16, 2017
This wording landed without comment as part of 7117ede (Expand on the
definition of our ops, 2015-10-13, opencontainers#225).  However, I'm not entirely
clear on the exception it's making.  It may be trying to say something
like:

  Just because you were authorized to manage that container when you
  created it doesn't mean you're still authorized to perform operation
  X on it now.  Maybe you've lost privileges in the meantime.

But as far as compliance testing is concerned, the same test harness
will be calling 'create' and the subsequent operations.  That harness
will be reporting MUST violations if the runtime refuses a subsequent
operation, and removing the access-control loophole makes it more
obvious that the runtime's refusal is non-compliant.

Signed-off-by: W. Trevor King <wking@tremily.us>
@@ -23,37 +19,111 @@ This allows the hooks to perform cleanup and teardown logic after the runtime de
* **`bundlePath`**: (string) is the absolute path to the container's bundle directory.
This is provided so that consumers can find the container's configuration and root filesystem on the host.

*Example*

When serialized in JSON, the format MUST adhere to the following pattern:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the intent of this sentence? Does "the following pattern" include the indentation pattern too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented in the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants