Skip to content

Commit

Permalink
docs: describe the port forwarding
Browse files Browse the repository at this point in the history
This doesn't include the bit-level detail, just the general shape.

Signed-off-by: David Scott <dave.scott@docker.com>
  • Loading branch information
djs55 committed May 3, 2017
1 parent fb18890 commit 7152825
Show file tree
Hide file tree
Showing 2 changed files with 129 additions and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,4 +62,4 @@ Design
------

- [Using vpnkit as a default gateway](docs/ethernet.md): describes the flow of ethernet traffic to/from the VM
- [DNS](docs/DNS.md): describes the DNS forwarder implementation
- [Port forwarding](docs/ports.md): describes how ports are forwarded from the host into the VM
128 changes: 128 additions & 0 deletions docs/ports.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Port-forwarding

This page describes how `vpnkit` is used by Docker for Mac and Docker for Windows to
open ports on host interfaces and to forward traffic to containers.

Note: this is completely separate from [using vpnkit as a default gateway](ethernet.md).
This implementation could be factored out into a separate executable.

## Background

On a regular installation of `docker` on Linux the command:

```
docker run -p 8080:80 nginx
```
starts an `nginx` container and forwards connections from `0.0.0.0:8080` on
the host to the container's port `80`.

The command:

```
docker run -p 1.2.3.4:8080:80 nginx
```
starts an `nginx` container but only forwards connections from `1.2.3.4:8080` on
the host to the container's port `80`.

On Linux port forwarding can be achieved either through `iptables` or by
running a simple user-space proxy. On Docker for Mac and Docker for Windows

- the `docker` daemon does not run on the host (except on Windows when running
Windows containers-- this is out of scope of this document) and so cannot run
anything on the host
- the container network within the VM is separated from the host's network
by `vpnkit` (ignoring the internal network used on Windows for volume sharing):
see [using vpnkit as default gateway](ethernet.md).

Therefore `vpnkit` includes a port forwarding service which allows the `docker` daemon
in the VM to open ports on the host and which forwards connections transparently
to the container port inside the VM.

## Docker daemon interface

The `docker` daemon can either use `iptables` or a userspace proxy to open ports
on a regular Linux system. In Docker for Mac and Docker for Windows we configure
the `docker` daemon to use a userspace proxy, and we provide our own custom
implementation. This acts as a very basic "plugin".

For example, after running `docker run -p 8080:80 nginx` on the host, inside the
VM we can see a process:

```
/usr/bin/slirp-proxy -proto tcp -host-ip 0.0.0.0 -host-port 8080 -container-ip 172.17.0.2 -container-port 80
```
This shows a single port forward from `0.0.0.0:8080` on the host to the internal
IP `172.17.0.2:80` inside the VM.

This custom proxy uses a custom signaling protocol to communicate the port forwarding
request to the host, and to receive back success or error (e.g. `EADDRINUSE`
or `EADDRNOTAVAIL`).

## Signalling from the VM to the host

The control interface takes the form of a virtual 9P filesystem served by
`vpnkit` and mounted in the Linux VM. New port forwards are requested by
creating directories, and status (including error messages) read by reading
files.

For example, after running `docker run -p 8080:80 nginx` on the host, inside
the VM we can see:
```
/ # ls /port
README tcp:0.0.0.0:8080:tcp:172.17.0.2:80
```
This shows a single active port forward from `0.0.0.0:8080` on the host to the internal
IP `172.17.0.2:80` inside the VM.

The initial filesystem mount is slightly different between Docker for Mac
and Docker for Windows.

The [hyperkit](https://github.com/moby/hyperkit) hypervisor on the Mac has a
[virtio-9p](https://github.com/moby/hyperkit/blob/master/src/lib/pci_virtio_9p.c)
device which connects to a Unix domain socket whenever the VM issues a `mount`
command.

On Windows we run a process `9p-mount` which calls `listen` on a Hyper-V socket for
connections from the host. `vpnkit` calls `connect`, and then `9p-mount` calls
`accept` and then passes
the file descriptor to the `mount` command via the `rfdno` and `wfdno`
arguments.

When it has opened a new host port forward, the custom proxy retains an open
file descriptor referencing a control file on the filesystem. If the proxy
is killed or crashes the Linux kernel will close the file descriptor and emit
a 9P `clunk` message which is used by `vpnkit` to shut down the port forward.
This ensures that the port forwards in `vpnkit` do not leak.

Note: the use of a mounted filesystem is not ideal, for if the `vpnkit` process
is restarted then the filesystem becomes broken. It would be better in future
to use a reconnectable protocol.

Note: the use of `clunk` like this is quite fragile; if another process were
to `open` and `close` the file it would prematurly call `clunk` and shutdown
the port forward.

## Forwarding connections

When a client connects to the port on the host, `vpnkit` accepts the connection.

On Windows, `vpnkit` calls `connect` on a Hyper-V socket to connect to the VM
on a well-known port.

On the Mac, `vpnkit` calls `connect` on a Unix domain socket to connect to the
[hyperkit](https://github.com/moby/hyperkit) hypervisor's `virtio-vsock` control
socket. `vpnkit` writes a short header including the well-known `AF_VSOCK`
port number and is connected to the VM.

Inside
the VM there is a connection demultiplexer which calls `listen` on this well-known port.
This process calls `accept` and then reads a simple header which includes the
ultimate destination IP and port (`172.17.0.2:80` in the example above).
The demultiplexer calls `connect` to the container port and starts proxying data.

Note: since early versions of Windows 10 do not support `shutdown` (i.e. the
ability to signal that `write` calls have finished but while allowing `read` calls to continue
c.f. TCP half-close) there is a simple protocol layered over the Hyper-V socket
stream which implements this behaviour, not described here.

Note: the code also transports UDP with a simple framing protocol, not described here.

0 comments on commit 7152825

Please sign in to comment.