-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Starting parallel containers with same mounts can cause an error #4543
Comments
If (This would be a bug in https://github.com/cyphar/filepath-securejoin FWIW. I'll open a bug there.) |
If the external modifier deletes the directory after Isn't it enough to just run a stat call after mkdirat returns |
If we just need to allow racing creates to not cause errors, we don't need a stat call (it also wouldn't help with races), the patch can just be something as simple as: diff --git a/mkdir_linux.go b/mkdir_linux.go
index b5f674524c84..6dfe8c42b364 100644
--- a/mkdir_linux.go
+++ b/mkdir_linux.go
@@ -119,7 +119,12 @@ func MkdirAllHandle(root *os.File, unsafePath string, mode int) (_ *os.File, Err
// NOTE: mkdir(2) will not follow trailing symlinks, so we can safely
// create the final component without worrying about symlink-exchange
// attacks.
- if err := unix.Mkdirat(int(currentDir.Fd()), part, uint32(mode)); err != nil {
+ //
+ // If we get -EEXIST, it's possible that another program created the
+ // directory at the same time as us. In that case, just continue on as
+ // if we created it (if the created inode is not a directory, the
+ // following open call will fail).
+ if err := unix.Mkdirat(int(currentDir.Fd()), part, uint32(mode)); err != nil && !errors.Is(err, unix.EEXIST) {
err = &os.PathError{Op: "mkdirat", Path: currentDir.Name() + "/" + part, Err: err}
// Make the error a bit nicer if the directory is dead.
if err2 := isDeadInode(currentDir); err2 != nil { (The following I'll open a PR with this, along with some extra tests. |
Ah, ok. I didn't check that this mkdirat was already followed by openat. |
cyphar/filepath-securejoin#35 should fix the issue. I'll merge it and do a new release soon. |
…0.3.5 (#6469) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [github.com/cyphar/filepath-securejoin](https://redirect.github.com/cyphar/filepath-securejoin) | `v0.2.5` -> `v0.3.5` | [![age](https://developer.mend.io/api/mc/badges/age/go/github.com%2fcyphar%2ffilepath-securejoin/v0.3.5?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/go/github.com%2fcyphar%2ffilepath-securejoin/v0.3.5?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/go/github.com%2fcyphar%2ffilepath-securejoin/v0.2.5/v0.3.5?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/go/github.com%2fcyphar%2ffilepath-securejoin/v0.2.5/v0.3.5?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>cyphar/filepath-securejoin (github.com/cyphar/filepath-securejoin)</summary> ### [`v0.3.5`](https://redirect.github.com/cyphar/filepath-securejoin/releases/tag/v0.3.5) [Compare Source](https://redirect.github.com/cyphar/filepath-securejoin/compare/v0.3.4...v0.3.5) This release primarily includes a fix for an issue involving two programs racing to MkdirAll the same directory, which caused a regression with BuildKit. - `MkdirAll` will now no longer return an `EEXIST` error if two racing processes are creating the same directory. We will still verify that the path is a directory, but this will avoid spurious errors when multiple threads or programs are trying to `MkdirAll` the same path. [opencontainers/runc#4543](https://redirect.github.com/opencontainers/runc/issues/4543) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> ### [`v0.3.4`](https://redirect.github.com/cyphar/filepath-securejoin/releases/tag/v0.3.4) [Compare Source](https://redirect.github.com/cyphar/filepath-securejoin/compare/v0.3.3...v0.3.4) This release primarily includes a fix that blocked using filepath-securejoin in Kubernetes. - Previously, some testing mocks we had resulted in us doing `import "testing"` in non-`_test.go` code, which made some downstreams like Kubernetes unhappy. This has been fixed. ([#​32](https://redirect.github.com/cyphar/filepath-securejoin/issues/32)) Thanks to all of the contributors who made this release possible: - Aleksa Sarai <cyphar@cyphar.com> - Stephen Kitt <skitt@redhat.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> ### [`v0.3.3`](https://redirect.github.com/cyphar/filepath-securejoin/releases/tag/v0.3.3) [Compare Source](https://redirect.github.com/cyphar/filepath-securejoin/compare/v0.3.2...v0.3.3) This release primarily includes fixes for spurious errors we hit when checking that directories created by MkdirAll "look right". Upon further consideration, these checks were fundamentally buggy and didn't offer any practical protection anyway. - The mode and owner verification logic in `MkdirAll` has been removed. This was originally intended to protect against some theoretical attacks but upon further consideration these protections don't actually buy us anything and they were causing spurious errors with more complicated filesystem setups. - The "is the created directory empty" logic in `MkdirAll` has also been removed. This was not causing us issues yet, but some pseudofilesystems (such as `cgroup`) create non-empty directories and so this logic would've been wrong for such cases. Thanks to all of the contributors who made this release possible: - Aleksa Sarai <cyphar@cyphar.com> - Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> ### [`v0.3.2`](https://redirect.github.com/cyphar/filepath-securejoin/releases/tag/v0.3.2) [Compare Source](https://redirect.github.com/cyphar/filepath-securejoin/compare/v0.3.1...v0.3.2) This release includes a few fixes for MkdirAll when dealing with S_ISUID and S_ISGID, to solve a regression runc hit when switching to MkdirAll. - Passing the S_ISUID or S_ISGID modes to MkdirAllInRoot will now return an explicit error saying that those bits are ignored by mkdirat(2). In the past a different error was returned, but since the silent ignoring behaviour is codified in the man pages a more explicit error seems apt. While silently ignoring these bits would be the most compatible option, it could lead to users thinking their code sets these bits when it doesn't. Programs that need to deal with compatibility can mask the bits themselves. ([#​23](https://redirect.github.com/cyphar/filepath-securejoin/issues/23), [#​25](https://redirect.github.com/cyphar/filepath-securejoin/issues/25)) - If a directory has S_ISGID set, then all child directories will have S_ISGID set when created and a different gid will be used for any inode created under the directory. Previously, the "expected owner and mode" validation in securejoin.MkdirAll did not correctly handle this. We now correctly handle this case. ([#​24](https://redirect.github.com/cyphar/filepath-securejoin/issues/24), [#​25](https://redirect.github.com/cyphar/filepath-securejoin/issues/25)) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> ### [`v0.3.1`](https://redirect.github.com/cyphar/filepath-securejoin/releases/tag/v0.3.1) [Compare Source](https://redirect.github.com/cyphar/filepath-securejoin/compare/v0.3.0...v0.3.1) - By allowing `Open(at)InRoot` to opt-out of the extra work done by `MkdirAll` to do the necessary "partial lookups", `Open(at)InRoot` now does less work for both implementations (resulting in a many-fold decrease in the number of operations for `openat2`, and a modest improvement for non-`openat2`) and is far more guaranteed to match the correct `openat2(RESOLVE_IN_ROOT)` behaviour. - We now use `readlinkat(fd, "")` where possible. For `Open(at)InRoot` this effectively just means that we no longer risk getting spurious errors during rename races. However, for our hardened procfs handler, this in theory should prevent mount attacks from tricking us when doing magic-link readlinks (even when using the unsafe host `/proc` handle). Unfortunately `Reopen` is still potentially vulnerable to those kinds of somewhat-esoteric attacks. Technically this [will only work on post-2.6.39 kernels][linux-readlinkat-emptypath] but it seems incredibly unlikely anyone is using `filepath-securejoin` on a pre-2011 kernel. - Several improvements were made to the errors returned by `Open(at)InRoot` and `MkdirAll` when dealing with invalid paths under the emulated (ie. non-`openat2`) implementation. Previously, some paths would return the wrong error (`ENOENT` when the last component was a non-directory), and other paths would be returned as though they were acceptable (trailing-slash components after a non-directory would be ignored by `Open(at)InRoot`). These changes were done to match `openat2`'s behaviour and purely is a consistency fix (most users are going to be using `openat2` anyway). [linux-readlinkat-emptypath]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=65cfc6722361570bfe255698d9cd4dccaf47570d Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> ### [`v0.3.0`](https://redirect.github.com/cyphar/filepath-securejoin/releases/tag/v0.3.0) [Compare Source](https://redirect.github.com/cyphar/filepath-securejoin/compare/v0.2.5...v0.3.0) This release contains no changes to SecureJoin. However, it does introduce a new `*os.File`-based API which is much safer to use for most usecases. These are adapted from [libpathrs][1] and are the bare minimum to be able to operate more safely on an untrusted rootfs where an attacker has write access (something that SecureJoin cannot protect against). The new APIs are: - OpenInRoot, which resolves a path inside a rootfs and returns an `*os.File` handle to the path. Note that the file handle returned by OpenInRoot is an O_PATH handle, which cannot be used for reading or writing (as well as some other operations -- [see open(2) for more details](https://www.man7.org/linux/man-pages/man2/open.2.html)). - Reopen, which takes an O_PATH file handle and safely re-opens it to "upgrade" it to a regular handle. - MkdirAll, which is a safe implementation of os.MkdirAll that can be used to create directory trees inside a rootfs. As these are new APIs, it is possible they may change in the future. However, they should be safe to start migrating to as we have extensive tests ensuring they behave correctly and are safe against various races and other attacks. [1]: https://redirect.github.com/openSUSE/libpathrs Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/open-telemetry/opentelemetry-go-contrib). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS41OC4xIiwidXBkYXRlZEluVmVyIjoiMzkuNTguMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiU2tpcCBDaGFuZ2Vsb2ciLCJkZXBlbmRlbmNpZXMiXX0=--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…0.3.6 (#6066) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [github.com/cyphar/filepath-securejoin](https://redirect.github.com/cyphar/filepath-securejoin) | `v0.2.4` -> `v0.3.6` | [![age](https://developer.mend.io/api/mc/badges/age/go/github.com%2fcyphar%2ffilepath-securejoin/v0.3.6?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/go/github.com%2fcyphar%2ffilepath-securejoin/v0.3.6?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/go/github.com%2fcyphar%2ffilepath-securejoin/v0.2.4/v0.3.6?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/go/github.com%2fcyphar%2ffilepath-securejoin/v0.2.4/v0.3.6?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- > [!WARNING] > Some dependencies could not be looked up. Check the Dependency Dashboard for more information. --- ### Release Notes <details> <summary>cyphar/filepath-securejoin (github.com/cyphar/filepath-securejoin)</summary> ### [`v0.3.6`](https://redirect.github.com/cyphar/filepath-securejoin/releases/tag/v0.3.6) [Compare Source](https://redirect.github.com/cyphar/filepath-securejoin/compare/v0.3.5...v0.3.6) This release lowers the minimum Go version to Go 1.18 as well as some library dependencies, in order to make it easier for folks that need to backport patches using the new filepath-securejoin API onto branches that are stuck using old Go compilers. For users using Go >= 1.21, this release contains no functional changes. - The minimum Go version requirement for `filepath-securejoin` is now Go 1.18 (we use generics internally). For reference, `filepath-securejoin@v0.3.0` somewhat-arbitrarily bumped the Go version requirement to 1.21. While we did make some use of Go 1.21 stdlib features (and in principle Go versions <= 1.21 are no longer even supported by upstream anymore), some downstreams have complained that the version bump has meant that they have to do workarounds when backporting fixes that use the new `filepath-securejoin` API onto old branches. This is not an ideal situation, but since using this library is probably better for most downstreams than a hand-rolled workaround, we now have compatibility shims that allow us to build on older Go versions. - Lower minimum version requirement for `golang.org/x/sys` to `v0.18.0` (we need the wrappers for `fsconfig(2)`), which should also make backporting patches to older branches easier. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> ### [`v0.3.5`](https://redirect.github.com/cyphar/filepath-securejoin/releases/tag/v0.3.5) [Compare Source](https://redirect.github.com/cyphar/filepath-securejoin/compare/v0.3.4...v0.3.5) This release primarily includes a fix for an issue involving two programs racing to MkdirAll the same directory, which caused a regression with BuildKit. - `MkdirAll` will now no longer return an `EEXIST` error if two racing processes are creating the same directory. We will still verify that the path is a directory, but this will avoid spurious errors when multiple threads or programs are trying to `MkdirAll` the same path. [opencontainers/runc#4543](https://redirect.github.com/opencontainers/runc/issues/4543) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> ### [`v0.3.4`](https://redirect.github.com/cyphar/filepath-securejoin/releases/tag/v0.3.4) [Compare Source](https://redirect.github.com/cyphar/filepath-securejoin/compare/v0.3.3...v0.3.4) This release primarily includes a fix that blocked using filepath-securejoin in Kubernetes. - Previously, some testing mocks we had resulted in us doing `import "testing"` in non-`_test.go` code, which made some downstreams like Kubernetes unhappy. This has been fixed. ([#​32](https://redirect.github.com/cyphar/filepath-securejoin/issues/32)) Thanks to all of the contributors who made this release possible: - Aleksa Sarai <cyphar@cyphar.com> - Stephen Kitt <skitt@redhat.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> ### [`v0.3.3`](https://redirect.github.com/cyphar/filepath-securejoin/releases/tag/v0.3.3) [Compare Source](https://redirect.github.com/cyphar/filepath-securejoin/compare/v0.3.2...v0.3.3) This release primarily includes fixes for spurious errors we hit when checking that directories created by MkdirAll "look right". Upon further consideration, these checks were fundamentally buggy and didn't offer any practical protection anyway. - The mode and owner verification logic in `MkdirAll` has been removed. This was originally intended to protect against some theoretical attacks but upon further consideration these protections don't actually buy us anything and they were causing spurious errors with more complicated filesystem setups. - The "is the created directory empty" logic in `MkdirAll` has also been removed. This was not causing us issues yet, but some pseudofilesystems (such as `cgroup`) create non-empty directories and so this logic would've been wrong for such cases. Thanks to all of the contributors who made this release possible: - Aleksa Sarai <cyphar@cyphar.com> - Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> ### [`v0.3.2`](https://redirect.github.com/cyphar/filepath-securejoin/releases/tag/v0.3.2) [Compare Source](https://redirect.github.com/cyphar/filepath-securejoin/compare/v0.3.1...v0.3.2) This release includes a few fixes for MkdirAll when dealing with S_ISUID and S_ISGID, to solve a regression runc hit when switching to MkdirAll. - Passing the S_ISUID or S_ISGID modes to MkdirAllInRoot will now return an explicit error saying that those bits are ignored by mkdirat(2). In the past a different error was returned, but since the silent ignoring behaviour is codified in the man pages a more explicit error seems apt. While silently ignoring these bits would be the most compatible option, it could lead to users thinking their code sets these bits when it doesn't. Programs that need to deal with compatibility can mask the bits themselves. ([#​23](https://redirect.github.com/cyphar/filepath-securejoin/issues/23), [#​25](https://redirect.github.com/cyphar/filepath-securejoin/issues/25)) - If a directory has S_ISGID set, then all child directories will have S_ISGID set when created and a different gid will be used for any inode created under the directory. Previously, the "expected owner and mode" validation in securejoin.MkdirAll did not correctly handle this. We now correctly handle this case. ([#​24](https://redirect.github.com/cyphar/filepath-securejoin/issues/24), [#​25](https://redirect.github.com/cyphar/filepath-securejoin/issues/25)) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> ### [`v0.3.1`](https://redirect.github.com/cyphar/filepath-securejoin/releases/tag/v0.3.1) [Compare Source](https://redirect.github.com/cyphar/filepath-securejoin/compare/v0.3.0...v0.3.1) - By allowing `Open(at)InRoot` to opt-out of the extra work done by `MkdirAll` to do the necessary "partial lookups", `Open(at)InRoot` now does less work for both implementations (resulting in a many-fold decrease in the number of operations for `openat2`, and a modest improvement for non-`openat2`) and is far more guaranteed to match the correct `openat2(RESOLVE_IN_ROOT)` behaviour. - We now use `readlinkat(fd, "")` where possible. For `Open(at)InRoot` this effectively just means that we no longer risk getting spurious errors during rename races. However, for our hardened procfs handler, this in theory should prevent mount attacks from tricking us when doing magic-link readlinks (even when using the unsafe host `/proc` handle). Unfortunately `Reopen` is still potentially vulnerable to those kinds of somewhat-esoteric attacks. Technically this [will only work on post-2.6.39 kernels][linux-readlinkat-emptypath] but it seems incredibly unlikely anyone is using `filepath-securejoin` on a pre-2011 kernel. - Several improvements were made to the errors returned by `Open(at)InRoot` and `MkdirAll` when dealing with invalid paths under the emulated (ie. non-`openat2`) implementation. Previously, some paths would return the wrong error (`ENOENT` when the last component was a non-directory), and other paths would be returned as though they were acceptable (trailing-slash components after a non-directory would be ignored by `Open(at)InRoot`). These changes were done to match `openat2`'s behaviour and purely is a consistency fix (most users are going to be using `openat2` anyway). [linux-readlinkat-emptypath]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=65cfc6722361570bfe255698d9cd4dccaf47570d Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> ### [`v0.3.0`](https://redirect.github.com/cyphar/filepath-securejoin/releases/tag/v0.3.0) [Compare Source](https://redirect.github.com/cyphar/filepath-securejoin/compare/v0.2.5...v0.3.0) This release contains no changes to SecureJoin. However, it does introduce a new `*os.File`-based API which is much safer to use for most usecases. These are adapted from [libpathrs][1] and are the bare minimum to be able to operate more safely on an untrusted rootfs where an attacker has write access (something that SecureJoin cannot protect against). The new APIs are: - OpenInRoot, which resolves a path inside a rootfs and returns an `*os.File` handle to the path. Note that the file handle returned by OpenInRoot is an O_PATH handle, which cannot be used for reading or writing (as well as some other operations -- [see open(2) for more details](https://www.man7.org/linux/man-pages/man2/open.2.html)). - Reopen, which takes an O_PATH file handle and safely re-opens it to "upgrade" it to a regular handle. - MkdirAll, which is a safe implementation of os.MkdirAll that can be used to create directory trees inside a rootfs. As these are new APIs, it is possible they may change in the future. However, they should be safe to start migrating to as we have extensive tests ensuring they behave correctly and are safe against various races and other attacks. [1]: https://redirect.github.com/openSUSE/libpathrs Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> ### [`v0.2.5`](https://redirect.github.com/cyphar/filepath-securejoin/releases/tag/v0.2.5) [Compare Source](https://redirect.github.com/cyphar/filepath-securejoin/compare/v0.2.4...v0.2.5) This release makes some minor improvements to SecureJoin: - Some changes were made to how lexical components are handled during resolution. There is no change in behaviour, and both implementations are safe, however the newer implementation is much easier to reason about. - The error returned when a symlink loop has been detected will now reference the correct path. [#​10](https://redirect.github.com/cyphar/filepath-securejoin/issues/10) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/open-telemetry/opentelemetry-go). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS42OS4zIiwidXBkYXRlZEluVmVyIjoiMzkuNjkuMyIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiU2tpcCBDaGFuZ2Vsb2ciLCJkZXBlbmRlbmNpZXMiXX0=--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Description
When starting a container,
runc
will create destination paths for mountpoints. If there is another process creating the same paths at the same time then this step can fail withmkdir
failing withEEXIST
and container failing to start.This race can happen if the paths getting mounted get modified externally, but it can also happen when just two
runc
processes are started at the same time and they compete with each other when creating the mountpoints.I didn't look too deeply, but looks like a bug in
mkdirp
logic where it could just verify that the path already exists after it receives this error.Reported via moby/buildkit#5566
Steps to reproduce the issue
This Dockerfile+script demonstrates the issue. I couldn't repro locally, but reproduces quite reliably (<20 iterations) in Github codespaces.
Note that in this Dockerfile, the paths for
type=cache
mounts are shared between stages. So/foo/bar
points to same directory in both and failure happens when tworunc
processes both try to create/foo/bar/baz
under it.Describe the results you received and expected
What version of runc are you using?
1.1.14 and 1.2.2 in codespaces
Host OS information
No response
Host kernel information
No response
The text was updated successfully, but these errors were encountered: