Skip to content

Latest commit

 

History

History
141 lines (106 loc) · 6.69 KB

disk_structure.md

File metadata and controls

141 lines (106 loc) · 6.69 KB

Disk Structure

Assumptions:

  • Identifiers are 24 character strings (subject to change) and specified by the caller. Random distribution of identifiers is important to prevent unbalanced search trees in large deployments - but in small deployments descriptive names are just as valuable.
  • Ports are passed by the caller and are assumed to match to the image. A caller is allowed to specify an external port, which may fail if the port is taken.
  • To prevent directories from reaching excessive size at high density, directories mapped to ids are partitioned by the first two characters of the id or, for ports, a modulus value.
  • The structure of persistent state on disk should facilitate administrators recovering the state of their systems using filesystem backups, and also be friendly to standard Linux toolchain introspection of their contents.

The on disk structure of geard is exploratory at the moment. The major components are described below:

/etc/systemd/system/container-active.target.wants/
  ctr-abcdef.service -> <symlink>

    This directory is read by systemd on startup (container-active.target is WantedBy multi-user) to 
    start containers on startup.  Containers stopped via the stop API call will not be started on
    reboot.

/var/lib/containers/
  All content is located under this root

  units/
    ab/
      ctr-abcdef.service   # hardlink to the current unit file version
      ctr-abcdef.idle      # flag indicating this unit is currently idle
      abcdef/
        <requestid>        # a particular version of the unit file.

    A container is considered present on this system if a service file exists inside the namespaced container
    directory.

    The unit file is "enabled" in systemd (symlinked to systemd's unit directory) upon creation, and "disabled"
    (unsymlinked) on the remove operation.  The definition can be updated atomically (write new definition,
    update hardlink) when a new version of the container is deployed to the system.

    If a container is idled, a flag is written to the appropriate unit's directory.  Only containers with an
    idle flag are considered valid targets for unidling.

  targets/
    container.target         # default target
    container-active.target  # active target

    All containers are assigned to one of these two targets - on create or start, they have
    "WantedBy=container-active.target".  If a container is stopped via the API it is altered to be 
    "WantedBy=container.target".  In this fashion the disk structure for each unit reflects whether the container
    should be started on reboot vs. being explicitly idled.  Also, assuming the /var/lib/containers directory
    is an attached disk, on node recovery each *.service file is enabled with systemd and then the
    "container-active.target" can be started.

  slices/
    container.slice        # default slice
    container-small.slice  # more limited slice
    container-large.slice  # additional resource slice

    All slice units are created in this directory.  At the moment, the three slices are defaults and are created
    on first startup of the process, enabled, then started.  More advanced cgroup settings must be configured
    after creation, which is outside the scope of this prototype.

    The container slice can be set during installation using the --slice command line option. The default slice is
    'container-small'.

  env/
    contents/
      a3/
        a3408aabfed

        Files storing environment variables and values in KEY="VALUE" (one per line) form.

  data/
    TBD (reserved for container unique volumes)

  ports/
    links/
      3f/
        3fabc98341ac3fe...24  # text file describing internal->external links to other networks

        Each container has one file with one line per network link, internal port first, a tab, then
        external port, then external host IP / DNS.

        On startup, gear init --post attempts to convert this file to a set of iptables rules in
        the container to outbound traffic.

    interfaces/
      1/
        49/
          4900  # softlink to the container's unit file

          To allocate a port, the daemon scans a block (49) of 100 ports for a set of free ports.  If no ports
          are found, it continues to the next block.  Currently the daemon starts at the low end of the port
          range and walks disk until it finds the first free port.  Worst case is that the daemon would do
          many directory reads (30-50) until it finds a gap.

          To remove a container, the unit file is deleted, and then any broken softlinks can be deleted.

          The first subdirectory represents an interface, to allow future expansion of the external IP space
          onto multiple devices, or to allow multiple external ports to be bound to the same IP (for VPC)

          Example script:

            sudo find /var/lib/containers/ports/interfaces -type l -printf "%l %f " -exec cut -f 1-2 {} \;

          prints the port description path (of which the name of the path is the container id), the public port,
          and the value of the description file (which might have multiple lines).  Would show what ports
          are mismatched.

  keys/
    ab/
      ab0a8oeunthxjqkgjfrJQKNHa7384  # text file in authorized_keys format representing a single public key

      Each file represents a single public key, with the identifier being the a base64 encoded SHA256 sum of
      the binary value of the key.  The file is stored in authorized_keys format for SSHD, but with only the
      type and value sections present and no newlines.

      Any key that has zero incoming links can be deleted.

  access/
    containers/
      3f/
        3fabc98341ac3fe...24/  # container id
          key1  # softlink to a public key authorized to access this container

          The names of the softlink should map to an container id or container label (future) - each container id should match
          to a user on the system to allow sshd to login via the container id.  In the future, improvements in sshd
          may allow us to use virtual users.

    git/
      read/
        ab/
          ab934xrcgqkou08/  # repository id
            key1  # softlink to a public key authorized for read access to this repo

      write/
        ab/
          ab934xrcgqkou08/  # repository id
            key2  # softlink to a public key authorized for write access to this repo