-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config: Clarify mounts[].source relative path anchor #735
Conversation
The code implementing this behavior in if m.Type == "bind" {
if !filepath.IsAbs(source) {
source = filepath.Join(cwd, m.Source)
}
} According to @crosbymichael, that It's also often the case where IMO, |
On Wed, May 10, 2017 at 09:24:28PM -0700, Tianon Gravi wrote:
It's also often the case where `source` is a dummy value, not a path
at all (such as `tmpfs`, `proc`, `devpts`, or `none`), in which case
the value ought to be passed to the kernel as-is (since making them
"absolute" doesn't make any sense).
Agreed, but I don't see a way to distinguish dummy values unless you
have logic like “the source is a dummy for non-bind mounts where
filesystemtype is not proc, sysfs, tmpfs, (lots of other
pseudo-filesystems)”.
IMO, `runc`'s behavior here is fine behavior, but I'm not sure it's
worth codifying in the specification, especially since it matches
`mount(1)`'s behavior…
By default mount(8) canonicalizes both paths for bind mounts:
# strace -o trace mount --bind a b
# grep 'mount(' trace
mount("/tmp/a", "/tmp/b", 0xd03310, MS_MGC_VAL|MS_BIND, NULL) = 0
To avoid that, you can use --no-canonicalize:
# strace -o trace mount --bind --no-canonicalize a b
# grep 'mount(' trace
mount("a", "b", 0xe08370, MS_MGC_VAL|MS_BIND, NULL) = 0
Canonicalization seems to have something to do with whether the
absolute path exists:
# strace -o trace mount --bind foo b
mount: special device foo does not exist
# grep 'mount(' trace
mount("foo", "/tmp/b", 0x1a3a310, MS_MGC_VAL|MS_BIND, NULL) = -1 ENOENT (No such file or directory)
But pseudo-filesystem sources are not canonicalized even if they do
exist:
# mkdir -p a
# strace -o trace mount -t proc a b
# grep 'mount(' trace
mount("a", "/tmp/b", "proc", MS_MGC_VAL, NULL) = 0
The util-linux approach to this seems to be maintaining a list of
pseudo-filesystem types [1,2,3,4], but that's a heavy burden to place
on runtimes. Not canonicalizing paths exposed FUSE TOCTTOU symlink
attacks related to umounting arbitrary mounts [5,6,7], but it's not
clear to me if that vulnerability applies to container runtimes or
not. And runC's current “only canonicalize bind sources” does not
seem to cover the FUSE vulnerability.
I don't mind leaving relative paths as entirely
implementation-dependent.
There are a few orthogonal issues here:
* Whether we allow relative source paths at all. I think we want to,
because (as you pointed out in the call), distinguishing relative
paths from dummy paths is hard (you need access to a list of
pseudo-filesystem types).
If we do allow them:
* Whether to canonicalize source paths or not before calling
mount(2). I'm fine leaving this unspecified, since it's related
to security issues (the TOCTTOU symlink attack) and I doubt users
care enough for it to be implementation-defined.
* What the root path is for relative paths. I'm fine with either
“the container process's cwd” or “the bundle”, but the current
spec has existing relative paths anchored at the bundle
(e.g. root.path) and nothing anchored at the container process's
cwd (presumably before the chdir(3) for process.cwd). But if we
leave this implementation-defined, config authors interested in
cross-runtime portability will have to avoid relative paths.
* The “what namespace are these resolved in” issue. Thinking this
over more, “runtime mount namespace” is still not right (and maybe
this is what folks were saying in the meeting and I was missing it).
What we really have is:
* The container mount namespace is seeded from the runtime mount
namespace (wording for this in flight with #795).
* The container process (or something else in the container mount
namespace) runs through ‘mounts’ and executes them.
* The container process (or something else in the container mount
namespace) pivot_root()s into root.path.
So the paths are “in the container mount namespace before the root
pivot”. For example, if you had:
{
"root": {
"path": "rootfs",
},
"mounts": [
{
"type": "proc",
"destination": "/proc"
},
{
"type": "bind",
"source": "rootfs/proc/cpuinfo",
"destination": "/data"
"options": ["bind"]
}
],
…
}
You could ‘cat /data’ in the container to see the CPU info, but
`/path/to/bundle/rootfs/proc/cpuinfo` would not exist in the runtime
mount namespace (unless you were doing some fancy mount-propagation).
But the current spec has nothing to say about the pivot. It doesn't
even say (as far as I can tell) that ‘mounts’ must be processed
before ‘root’, or that ‘mounts’ should be executed in the container
mount namespace. So I'm not clear enough on the right wording to
float anything at the moment, but I think we need to say *something*
about this before 1.0.
[1]: https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/tree/libmount/src/context.c?h=v2.29.2#n1497
[2]: https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/tree/libmount/src/fs.c?h=v2.29.2#n602
[3]: https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/tree/libmount/src/fs.c?h=v2.29.2#n644
[4]: https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/tree/libmount/src/utils.c?h=v2.29.2#n261
[5]: https://nvd.nist.gov/vuln/detail/CVE-2011-0543
[6]: http://www.openwall.com/lists/oss-security/2011/02/02/2
[7]: libfuse/libfuse@cbd3a2a
|
config.md
Outdated
@@ -62,6 +62,7 @@ For Windows, see [mountvol][mountvol] and [SetVolumeMountPoint][set-volume-mount | |||
* Windows: the type of file system on the volume, e.g. "ntfs". | |||
* Solaris: corresponds to "type" of the fs resource in [zonecfg(1M)][zonecfg.1m]. | |||
* **`source`** (string, OPTIONAL) A device name, but can also be a directory name or a dummy. | |||
Path values are in the [runtime mount namespace](glossary.md#runtime-namespace) and are either absolute or relative to the bundle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take out language around runtime mount namespace. Rest is fine.
604205e
to
8353080
Compare
I've punted on the mount namespace issues with 604205e → 8353080. Per the call, either @crosbymichael or @mrunalp will file follow-up work clarifying that issue. |
rebase pls |
Needs rebase. |
8353080
to
67aca31
Compare
…tion Namespaces are not all hierarchical and processes aren't always in leaves. Also punt to config-linux.md for details about namespace creation, although currently that section doesn't talk much about how the runtime namespaces relate to new container namespaces [1]. Also mention resource access, because runtime namespaces play a role even if no new container namespaces are created. I've used resources that currently explicitly mention runtime namespaces as examples, although I think more resources (e.g. root.path and mounts[].source [2,3]) deserve wording about that as well and would be better examples if they'd already landed such wording. [1]: opencontainers#795 (comment) [2]: opencontainers#735 (comment) [3]: opencontainers@604205e#diff-c9c91c29b41257aea3a3403cc606ad99R65 Signed-off-by: W. Trevor King <wking@tremily.us>
…tion Namespaces are not all hierarchical and processes aren't always in leaves. Also punt to config-linux.md for details about namespace creation, although currently that section doesn't talk much about how the runtime namespaces relate to new container namespaces [1]. Also mention resource access, because runtime namespaces play a role even if no new container namespaces are created. I've used resources that currently explicitly mention runtime namespaces as examples, although I think more resources (e.g. root.path and mounts[].source [2,3]) deserve wording about that as well and would be better examples if they'd already landed such wording. [1]: opencontainers#795 (comment) [2]: opencontainers#735 (comment) [3]: opencontainers@604205e#diff-c9c91c29b41257aea3a3403cc606ad99R65 Signed-off-by: W. Trevor King <wking@tremily.us>
Borrowing language from root.path for absolute/relative. I think we also need to talk about 'mounts' being in the container mount namespace (after creating it with the runtime mount namespace as a seed) and before the pivot into root.path. But Michael and/or Mrunal are going to file follow-up work to address that. Signed-off-by: W. Trevor King <wking@tremily.us>
67aca31
to
50e13d9
Compare
…tion Namespaces are not all hierarchical and processes aren't always in leaves. Also punt to config-linux.md for details about namespace creation, although currently that section doesn't talk much about how the runtime namespaces relate to new container namespaces [1]. Also mention resource access, because runtime namespaces play a role even if no new container namespaces are created. I've used resources that currently explicitly mention runtime namespaces as examples, although I think more resources (e.g. root.path and mounts[].source [2,3]) deserve wording about that as well and would be better examples if they'd already landed such wording. Examples of resources retrieved from this namespace include linux.namespaces[].path and the resctrl psuedo-filesystem used for `linux.intelRdt`, but Mrunal and Michael didn't want me to include the examples in the glossary entry (probably because they could go stale). [1]: opencontainers#795 (comment) [2]: opencontainers#735 (comment) [3]: opencontainers@604205e#diff-c9c91c29b41257aea3a3403cc606ad99R65 Signed-off-by: W. Trevor King <wking@tremily.us>
Borrowing language from
root.path
for absolute/relative and fromlinux.namespaces[].path
for the runtime mount namespace.In #509 I'd floated a generic rule for anchoring relative runtime-namespace paths, but there aren't all that many relative runtime-namespace paths, so this PR takes a narrower approach.