Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

path/filepath: add IsLocal #56219

Closed
neild opened this issue Oct 13, 2022 · 61 comments
Closed

path/filepath: add IsLocal #56219

neild opened this issue Oct 13, 2022 · 61 comments

Comments

@neild
Copy link
Contributor

neild commented Oct 13, 2022

It is common for programs to accept filenames from untrusted sources. For example, an archive extractor might create files based on names in the archive, or a webserver may serve the content of local files identified by a URL path. In these cases, the program should usually sanitize the untrusted filenames before use. (See #55356 for a list of vulnerabilities caused by using unsanitized archive paths.)

We should provide a simple path sanitization function which accepts an untrusted input and returns something reasonably safe.

// Sanitize returns a representation of path under the current
// directory. If path is absolute or represents a location
// outside the current directory, Sanitize returns an error.
//
// Sanitize calls Clean on the result, but retains a trailing Separator if any.
//
// If a base path is joined to the result of Sanitize with Join,
// the resulting path will be contained within basepath.
//
// Sanitize does not consider symbolic links.
// Symbolic links can cause the sanitized path to reference a location
// outside the current directory.
func Sanitize(path string) (string, error)

Examples:

Sanitize("a") = "a", <nil>
Sanitize("a/b") = "a/b", <nil>
Sanitize("a/b/../c") = "a/c", <nil>
Sanitize("../a") = "", "../a" is outside the current directory
Sanitize("..") = "", ".." is outside the current directory
Sanitize(".") = "", "." is the current directory
Sanitize("/") = "", "/" is absolute
Sanitize("/a") = "", "/a" is absolute
Sanitize("a/") = "a/", <nil>
Sanitize("a/b/") = "a/b/", <nil>

// on Windows
Sanitize("NUL") = "", "NUL" is absolute

https://go.dev/play/p/EDzG8D15Zed contains a sample implementation.

@neild neild added the Proposal label Oct 13, 2022
@gopherbot gopherbot added this to the Proposal milestone Oct 13, 2022
@neild neild self-assigned this Oct 13, 2022
@robpike
Copy link
Contributor

robpike commented Oct 13, 2022

I like the idea but when I saw the issue title, it's purpose wasn't at all clear to me from the name "Sanitize". It's not about cleanliness (we already have "Clean") but security. Maybe something as simple as "Safe", but with a name like that perhaps it should consider symbolic links. Perhaps it should anyway. A conundrum.

@neild
Copy link
Contributor Author

neild commented Oct 13, 2022

I'm not sold on the name, although "Sanitize" does have the connotation of being more than just clean. ("To reduce or eliminate pathogenic agents", says Merriam-Webster.)

Symbolic links are an interesting question. If we do consider them, then we need to know the base path that the untrusted path is relative to. Not all users will want to avoid symbolic links, either; an archive extractor writing to an output file wants to avoid writing through symlinks, but a web server probably wants to follow links in the directory it is serving from. In some cases, there might be a race condition between checking for the presence of a link and writing to the file.

@mvdan
Copy link
Member

mvdan commented Oct 14, 2022

I lean towards not handling symlinks. Besides the raciness that neild mentions, once the program opens the sanitized path, there's other edge cases that could possibly cause trouble, so it's hard to guarantee complete safety:

  • Linux bind mounts could still lead you to write a file in a parent directory
  • Some filesystems could alter or not support the sanitized filename, e.g. case insensitive filesystems

I think this API would still be safe for a fairly common use case: extracting files from an archive (like a txtar) into a new directory. In that case, you don't have to worry about symlinks.

As for the name: I understand API as "the file path is relative and a child". I agree with Rob that I'm not a big fan of Sanitize, mainly because it doesn't give me that idea. Perhaps BoundedRelative, in the sense that we want a relative path that is "bounded to" the "dot" directory.

@bcmills
Copy link
Contributor

bcmills commented Oct 14, 2022

I'm not a fan of the implicit “under the current directory” behavior — there are lots of cases where I'd like to restrict a path to be within some other working directory (such as the temporary directory returned by t.TempDir within a particular test function).

I'm also not keen on the “retains a trailing Separator if any” caveat — it's inconsistent with filepath.Abs and filepath.Join, and I don't see a clear need for it.

Perhaps instead, we could have variants of Join and Rel that sanitize their results?

// JoinInDir joins any number of path elements to dir provided that
// the result is lexically still within dir. Empty elements are ignored.
//
// If the resulting path is not lexically within dir, JoinInDir returns a non-nil error.
func JoinInDir(dir, elem ...string) (string, error)

// RelInDir returns a path that is lexically equivalent to targpath when joined to dir,
// provided that targpath is lexically within dir.
// That is, JoinInDir(dir, RelInDir(dir, targpath)) is equivalent to targpath itself.
//
// An error is returned if targpath can't be made relative to basepath, if knowing
// the current working directory would be necessary to compute it, or if
// the resulting path would not be within dir when joined to it.
func RelInDir(dir, targpath string) (string, error)

Then, if needed, something very close to Sanitize could be written in terms of those:

func Sanitize(path string) (string, error) {
	if filepath.IsAbs(path) {
		cwd, err := os.Getwd()
		if _, err := filepath.RelInDir(cwd, path); err != nil {
			return "", fmt.Errorf("%s is absolute", path)
		}
		return filepath.Clean(path)
	}

	return filepath.JoinInDir(".", path)
}

@neild
Copy link
Contributor Author

neild commented Oct 14, 2022

To restrict a path to some other working directory, you join the sanitized path to that directory:

sanitized, err := filepath.Sanitize(untrusted)
if err != nil { /* ... */ }

// path is guaranteed to be under /base/dir.
path := filepath.Join("/base/dir", sanitized)

One distinction between this and JoinInDir is that the above will return an error if the untrusted path is absolute, while JoinInDir--if consistent with Join--will silently convert absolute paths into relative ones. (JoinInDir("/a", "/b") is presumably "/a/b")

But I agree that it's very common to want to safely append an untrusted path to a trusted one, so perhaps the right operation here is a safer join.

I do think the function name should convey a sense that it's more secure than the alternatives. JoinInDir is descriptive, but doesn't strongly indicate that it's more suitable for use on untrusted inputs than Join. Perhaps SafeJoin?

@AndrewHarrisSPU
Copy link

For relative paths, maybe there's a tree-reflexive property, where joining a relative path with a prefix only results in targeting the subset of nodes that are the child nodes of the prefix. The mental hiccup is assuming tree-reflexivity when that's not a general property of relative paths.

I think coercing the tree-reflexive property (or an error) from a relative path can be seen as a unary operation on a relative path, or a binary operation on a relative path with nothing.

RefSafe(prefix string, rel string) with the tweaked notion of reflexivity for e.g. RefSafe("", "../") I think would be easy enough to grok. It could handle an empty, relative, or absolute prefix.

Still, ChildSafe() would be fun.

@earthboundkid
Copy link
Contributor

earthboundkid commented Oct 20, 2022

ISTM, most (all?) cases where you would want to use this involve os paths (since an fs.FS is rooted anyway) and joining to a base. I'm not sure when you would use path.Sanitize and not immediately turn around and do filepath.Join. So maybe it should just be filepath.SafeJoin(base string, followSymlinks bool, paths ...string) string.

@neild
Copy link
Contributor Author

neild commented Oct 20, 2022

SafeJoin seems easier to reason about, and I agree that most cases where you want this are going to join the path to a base anyway.

I'm not sold on the followSymlinks parameter. At worst, this provides a false sense of security and space for TOCTOU bugs; the fact that no symlink exists along that path now says nothing about the future.

@AndrewHarrisSPU
Copy link

I'm not sure when you would use path.Sanitize and not immediately turn around and do filepath.Join.

I think there are use cases where it's "eventually" or even "never", rather than "immediately". For example, a CLI tool taking a path as an argument might not Join until a non-trivial amount of work is done, but if it's going to fail when Sanitize fails it would be better to reject when parsing arguments.

@neild
Copy link
Contributor Author

neild commented Oct 20, 2022

Thinking more about symlinks, filename sanitization is the wrong time to check for links. Links need to be checked for atomically at file open time, to avoid TOCTOU bugs. Possibly this indicates that we should have a function or functions in the os package to securely open files without traversing symlinks, but I don't think filepath is the right place for this.

One issue with SafeJoin is that Join treats absolute paths in components after the first as relative paths. I don't think that's the right behavior for handling untrusted paths; an absolute path should be an error, not silently converted to a relative path.

Would it be reasonable for SafeJoin to be inconsistent with Join here?

Join("/a", "/b")     // /a/b
SafeJoin("/a", "/b") // error, /b is absolute

@jimmyfrasche
Copy link
Member

It should be an error if it escapes, regardless of how it does so. But SafeJoin("/a", "/a/b") should work fine.

@earthboundkid
Copy link
Contributor

It should be an error if it escapes, regardless of how it does so. But SafeJoin("/a", "/a/b") should work fine.

What if the base directory is a secret for some reason, then an attacker sends absolute paths until it finds one that succeeds? ISTM, it's safer to just fail any absolute paths in the paths parameter.

@neild
Copy link
Contributor Author

neild commented Oct 24, 2022

I believe we have two possible APIs at this point. One is a Sanitize function which takes a path and makes it safe. This is my original proposal (although I'm dropping the preservation of a trailing / here).

// Sanitize returns a representation of path under the current
// directory. If path is absolute or represents a location
// outside the current directory, Sanitize returns an error.
//
// Sanitize calls Clean on the result.
//
// If a base path is joined to the result of Sanitize with Join,
// the resulting path will be contained within basepath.
//
// Sanitize does not consider symbolic links.
// Symbolic links can cause the sanitized path to reference a location
// outside the current directory.
func Sanitize(path string) (string, error)

The other possibility is a SafeJoin.

// SafeJoin joins a base path to a relative path.
// The resulting path is guaranteed to contained within the base path.
// If relpath is an absolute path or contains any reference to a location outside the base path,
// SafeJoin returns an error. For example, SafeJoin("a", "../a/b") is an error, even though the
// final path is contained within the base path.
//
// SafeJoin calls Clean on the result.
//
// Sanitize does not consider symbolic links.
// Symbolic links can cause the resulting path to reference a location outside the base path.
func SafeJoin(basepath, relpath) (string, error)

Either of these functions may be trivially implemented in terms of the other.

func Sanitize(path string) (string, error) {
  return SafeJoin("", path)
}

func SafeJoin(basepath, relpath) (string, error) {
  s, err := Sanitize(relpath)
  if err != nil {
    return "", err
  }
  return Join(basepath, s)
}

Most use cases for sanitized paths will join an untrusted path to a trusted base directory, so SafeJoin directly supports the more common operation. However, SafeJoin is inconsistent with Join: Join("a", "/b") is "a/b", while SafeJoin considers this an error. (We could make SafeJoin consistent with Join here, but I'm not convinced that silently converting an untrusted absolute path to a relative one is the correct behavior.)

I lean slightly towards Sanitize, but only slightly.

@earthboundkid
Copy link
Contributor

Good summary. My preference is for SafeJoin. It feels like the more common case to me, and the names "Sanitize" vs. "Clean" are too similar and easy to get backwards.

@bcmills
Copy link
Contributor

bcmills commented Oct 24, 2022

(We could make SafeJoin consistent with Join here, but I'm not convinced that silently converting an untrusted absolute path to a relative one is the correct behavior.)

Could you give some more detail on that? It seems like the security property we want from SafeJoin is “no writes outside of a particular working directory”, but Join("a", "/b")"a/b" does retain that property.

(And, for that matter, users who are familiar with Join can always precede the call to SafeJoin with a call to filepath.IsAbs, although I'm not entirely sure whether that would flag the argument "/b" on Windows.)

@neild
Copy link
Contributor Author

neild commented Oct 24, 2022

Could you give some more detail on that? It seems like the security property we want from SafeJoin is “no writes outside of a particular working directory”, but Join("a", "/b") → "a/b" does retain that property.

That retains this security property, but it introduces a silent rewrite of an unsafe path into a safe-but-different form. I think that it's much clearer to be explicit and to never attempt to transform a path.

For example, I think it's much better for a naive tar extractor to produce an error when trying to extract a file named /etc/passwd than to silently convert the filename to ./etc/passwd.

This also provides better behavior and consistency on Windows: SafeJoin(`a`, `C:\a`) is an error, not a\C:\a. SafeJoin(`a`, `\\host\share\b`) is not a\host\share\b.

If a user does want to convert absolute filenames to relative ones, this can be trivially achieved by prepending ./: SafeJoin(base, "./"+rel)

@AndrewHarrisSPU
Copy link

There's a well-summarized idea section from Kubernetes around a vulnerability in volume mount paths concerning subpaths - another version of a problem where Sanitize would be useful.

It looks like their solution purely working off of path text, not considering symlinks, relies on a PathWithinBase, using two absolute paths.

The solution for safely mounting with symlinks apparently justified an entire package itself.

I think I prefer SafeJoin behavior to other options, but seeing how the symlink problem is handled here makes me wonder if JoinWithinBase is a more obvious spelling.

@neild
Copy link
Contributor Author

neild commented Oct 25, 2022

An interesting question is what should be done with a relative path of ".". Is the base directory itself within the set of safe paths?

Sanitize(".")      // "." or an error?
SafeJoin("a", ".") // "a" or an error?

I can think of arguments for either accepting or rejecting ".", but we should document it either way.

Right now, I lean towards rejecting ".", on the grounds that it's the safer alternative; rejecting a path the user might have wanted is less hazardous than accepting one they might not have.

@bcmills
Copy link
Contributor

bcmills commented Oct 25, 2022

rejecting a path the user might have wanted is less hazardous than accepting one they might not have.

I don't think I agree with that. Rejecting a path that the user might have wanted will encourage them to use a different function instead, which carries its own risks.

I think the specification for the functions is clearer if they allow a relative path of ".". (“Outside” and “within” seem like clear enough words to describe what we mean.)

@neild
Copy link
Contributor Author

neild commented Oct 25, 2022

Another possible API:

// IsSafe reports whether the path refers to a location within the current directory.
// If path is absolute or references a location above the current directory, IsSafe returns an error.
// On Windows, IsSafe will return an error if path refers to a reserved filename such as "NUL".
//
// If IsSafe(path) returns nil, then Join(base, path) will always produce a path contained within base.
//
// IsSafe does not consider symbolic links, which can cause the file named by path to exist outside
// the current directory.
func IsSafe(path string) error

Sanitize and SafeJoin are just an IsSafe check followed by Clean or Join.

@ianlancetaylor
Copy link
Contributor

I like the functionality of IsSafe, but I'm not fond of the name. There can be unsafe paths in the current directory (What if the current directory happens to be /dev? What if the path refers to a symlink to elsewhere?). What about IsLocalPath?

@magical
Copy link
Contributor

magical commented Oct 25, 2022

I don't understand Sanitize and IsSafe. A path cannot be safe in and of itself, it can only be safe for use in a particular context. And the proposed functions don't have that context.

I'm more partial to SafeJoin. But it seems like what is really called for is something like func SafeOpenInDir(root, path string) (*os.File, error) which would also be able to do symlink checks, etc.

@neild
Copy link
Contributor Author

neild commented Oct 26, 2022

There is no universal definition of "safe", but there is a category of paths which reference locations which are lexically contained within some root. Perhaps there's a better name for these paths than "safe" or "sanitized".

IsLocalPath feels like the inverse of IsNetworkPath, which isn't right.

I think the name should convey a sense that this is a function one uses on untrusted paths to make them trustworthy (within limits).

But it seems like what is really called for is something like func SafeOpenInDir(root, path string) (*os.File, error) which would also be able to do symlink checks, etc.

I think there's a good argument to be made for a function like SafeOpenInDir, although I'm not sure exactly what it should look like. It's not a replacement for Sanitize/SafeJoin/IsSafe, however. For example (as mentioned in #56219 (comment)), a webserver serving static files will want to sanitize untrusted paths to avoid escaping the content root, but may well want to follow symlinks because the local filesystem is considered to be trusted. (And indeed, this is exactly what http.FileSystem does.)

It's worth noting that of the CVEs stemming from the lack of filename sanitization in archive/tar and archive/zip (see #55356), I don't believe any are due to inadvertent symlink traversal. Symlink traversal generally only becomes a vulnerability when the attacker has some level write access to the filesystem; this is a comparatively unusual scenario.

@ianlancetaylor
Copy link
Contributor

there is a category of paths which reference locations which are lexically contained within some root.

Yes, that is what I was trying to get at with IsLocalPath. What is a good name for that category of paths?

@magical
Copy link
Contributor

magical commented Oct 26, 2022

there is a category of paths which reference locations which are lexically contained within some root.

Yes, that is what I was trying to get at with IsLocalPath. What is a good name for that category of paths?

UnrootedPath?
FloatingPath?
PathFragment?
Subpath?

@bcmills
Copy link
Contributor

bcmills commented Oct 26, 2022

I tend to agree that the names “safe” and “sanitize” are not particularly descriptive, especially if new categories of unsafe or unsanitary paths are uncovered in the future. (Compare with, say, the word “new” used in an API to refer to version 2 of something.)

Moreover, I'm not sure that the word “Safe” is all that helpful in directing users to the appropriate API: if the user is reading the documentation, then it probably suffices to add a short sentence near the beginning of the function's doc comment (like “For most uses, prefer JoinInDir as a safer alternative.”).

If the user is not reading the documentation (and is instead coding from memory), then it's probably better for the new function to have a name starting with a common prefix with the less-safe alternative: that makes it more likely to be noticed in autocomplete results and more likely to be noticed when looking at the original function's documentation on https://pkg.go.dev.

@bcmills
Copy link
Contributor

bcmills commented Oct 26, 2022

I don't think the IsSafe API is all that useful. Consider how it would be used:

if err := filepath.IsSafe(rel); err != nil {
	return fmt.Errorf(…, err)
}
abs := filepath.Join(dir, rel)

But if the user remembers to do that, why wouldn't they just do the equivalent post-Join check using the existing API, which is not that much more code?

abs := filepath.Join(dir, rel)
if clean, err := filepath.Rel(dir, abs); err != nil {
	return fmt.Errorf(…, err)
} else if elem, _, _ := strings.Cut(clean, filepath.Separator); elem == ".." {
	return fmt.Errorf(…, err)
}

@neild
Copy link
Contributor Author

neild commented Nov 10, 2022

The current directory, no matter what directory that might be or how it might change. I do see that I dropped a negative: That should be "Escapes reports whether path lexically refers to a location NOT contained within the current directory."

I welcome suggestions for a clearer phrasing.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/449239 mentions this issue: path/filepath: add Escapes

@robpike
Copy link
Contributor

robpike commented Nov 10, 2022

That missing "not" explains a lot!

I'll think about wording.

@robpike
Copy link
Contributor

robpike commented Nov 11, 2022

As suggested in my comment on the CL, I feel NonLocal is a much clearer and evocative name for what it does.

@neild
Copy link
Contributor Author

neild commented Nov 11, 2022

IsLocal, and invert the check? Avoids a double-negative.

@neild
Copy link
Contributor Author

neild commented Nov 14, 2022

I've updated https://go.dev/cl/449239 to:

  • Change the name to IsLocal. I think this is the best of the suggested alternatives so far. The Is prefix clearly indicates this is a predicate, and I agree with Rob that "Local" is clearer and more evocative.
  • Consider "." as local, so IsLocal(".")==true.
// IsLocal reports whether path, using lexical analysis only, has all of these properties:
//
//   - is within the subtree rooted at the current directory
//   - is not an absolute path
//   - is not empty
//   - on Windows, is not a reserved name such as "NUL"
//
// If IsLocal(path) returns true, then
// Join(base, path) will always produce a path contained within base and
// Clean(path) will always produce an unrooted path with no ".." path elements.
//
// IsLocal does not consider links, which can cause the location
// named by path to exist outside the current directory.
func IsLocal(path string) bool

Edit: s/any/all/

@bcmills
Copy link
Contributor

bcmills commented Nov 14, 2022

has any of these properties

s/any/all/ ?

@rsc
Copy link
Contributor

rsc commented Nov 14, 2022

This looks like an improvement except for the mentions of "current directory".
IsLocal is a purely lexical question; the actual current directory is or should be irrelevant.
I suggest changing "is within the subtree rooted at the current directory"
to "is within the subtree rooted at the directory in which path is evaluated".

And I suggest changing the final paragraph to:

// IsLocal is a purely lexical operation. In particular, it does not account for
// the effect of any symbolic links that may exist in the file system.

@neild
Copy link
Contributor Author

neild commented Nov 14, 2022

Now:

// IsLocal reports whether path, using lexical analysis only, has all of these properties:
//
//   - is within the subtree rooted at the directory in which path is evaluated
//   - is not an absolute path
//   - is not empty
//   - on Windows, is not a reserved name such as "NUL"
//
// If IsLocal(path) returns true, then
// Join(base, path) will always produce a path contained within base and
// Clean(path) will always produce an unrooted path with no ".." path elements.
//
// IsLocal is a purely lexical operation.
// In particular, it does not account for the effect of any symbolic links
// that may exist in the filesystem.
func IsLocal(path string) bool

@earthboundkid
Copy link
Contributor

Now that there is "IsLocal is a purely lexical operation" you can remove "lexical" from the beginning:

// IsLocal reports whether path has all of these properties:
//
//   - is within the subtree rooted at the directory in which path is evaluated
//   - is not an absolute path
//   - is not empty
//   - on Windows, is not a reserved name such as "NUL"
//
// If IsLocal(path) returns true, then
// Join(base, path) will always produce a path contained within base and
// Clean(path) will always produce an unrooted path with no ".." path elements.
//
// IsLocal is a purely lexical operation.
// In particular, it does not account for the effect of any symbolic links 
// that may exist in the filesystem.
func IsLocal(path string) bool

@rsc rsc changed the title proposal: path/filepath: add Escapes proposal: path/filepath: add IsLocal Nov 16, 2022
@rsc
Copy link
Contributor

rsc commented Nov 16, 2022

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: path/filepath: add IsLocal path/filepath: add IsLocal Nov 16, 2022
@rsc rsc modified the milestones: Proposal, Backlog Nov 16, 2022
gopherbot pushed a commit that referenced this issue Nov 16, 2022
IsLocal reports whether a path lexically refers to a location
contained within the directory in which it is evaluated.
It identifies paths that are absolute, escape a directory
with ".." elements, and (on Windows) paths that reference
reserved device names.

For #56219.

Change-Id: I35edfa3ce77b40b8e66f1fc8e0ff73cfd06f2313
Reviewed-on: https://go-review.googlesource.com/c/go/+/449239
Run-TryBot: Damien Neil <dneil@google.com>
Reviewed-by: Joseph Tsai <joetsai@digital-static.net>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Joedian Reid <joedian@golang.org>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/451657 mentions this issue: path/filepath: detect Windows CONIN$ and CONOUT$ paths in IsLocal

gopherbot pushed a commit that referenced this issue Nov 17, 2022
CreateFile creates a handle to the console input or screen buffer
when opening a file named CONIN$ or CONOUT$:

https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilea#consoles

Detect these paths as non-local.

For #56219.

Change-Id: Ib09e76a110d6ec09aef8038074b9bcbae09d00d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/451657
Run-TryBot: Damien Neil <dneil@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
@neild
Copy link
Contributor Author

neild commented Nov 18, 2022

IsLocal will be in 1.20.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests