Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add follow_symlinks to os.path.exists() #117705

Closed
nineteendo opened this issue Apr 10, 2024 · 21 comments
Closed

Add follow_symlinks to os.path.exists() #117705

nineteendo opened this issue Apr 10, 2024 · 21 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@nineteendo
Copy link
Contributor

nineteendo commented Apr 10, 2024

Feature or enhancement

Proposal:

Because we have os.path.exists() & os.path.lexists(), it might make sense to add a follow_symlinks argument to the former:

-def exists(path):
+def exists(path, *, follow_symlinks=True):
     """Test whether a path exists.  Returns False for broken symbolic links"""
     try:
-        os.stat(path)
+        os.stat(path, follow_symlinks=follow_symlinks)
     except (OSError, ValueError):
         return False
     return True

This matches pathlib.Path.exists() (as well as os.stat() & os.chmod()). Also, it looks like pathlib.Path doesn't have an lexists() function. (but that's less explicit and so unnecessary, see #21157 (comment))

Note: this argument also needs to be added to nt._path_exists(). Done.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@nineteendo nineteendo added the type-feature A feature request or enhancement label Apr 10, 2024
@erlend-aasland erlend-aasland added the stdlib Python modules in the Lib dir label Apr 10, 2024
@erlend-aasland
Copy link
Contributor

[...] Also, it looks like pathlib.Path doesn't have an lexists() function.

See previous discussion:

@nineteendo
Copy link
Contributor Author

OK, thanks for the information. I won't be pushing for that then.

@erlend-aasland
Copy link
Contributor

OK, thanks for the information. I won't be pushing for that then.

It is useful to search the tracker, Discourse, and the old python-ideas/python-dev mailing list archives before submitting feature requests.

BTW, I would encourage you to take a good look at the devguide; it is a good read. Suggesting to start with the lifecycle of a pull request, and especially the subsection about making good PRs.

@nineteendo
Copy link
Contributor Author

It is useful to search the tracker, Discourse, and the old python-ideas/python-dev mailing list archives before submitting feature requests.

It's indeed useful, it looks like this functionality was rejected for os.path.isdir(): #57180, but this was later added to pathlib without mention of it: #105793. I guess we could then also reopen the discussing about adding follow_symlinks to os.path.isdir() & os.path.isfile().

@erlend-aasland
Copy link
Contributor

I guess we could then also reopen the discussing about adding follow_symlinks to os.path.isdir() & os.path.isfile().

The place for such a discussion would be Discourse. Iff there is consensus, follow up with an issue on the tracker.

@nineteendo
Copy link
Contributor Author

BTW, I would encourage you to take a good look at the devguide; it is a good read. Suggesting to start with the lifecycle of a pull request, and especially the subsection about making good PRs.

I have printed the latter, but I have 2 questions:

  1. How can I run tests on Windows? I don't have the make command.
  2. How am I supposed to configure my linter in VS Code? Autopep8 & isort are way too strict and I had to turn them off. It seems like only a trailing newline is required.

@erlend-aasland
Copy link
Contributor

  1. How can I run tests on Windows? I don't have the make command.

You'll find instructions both for building and running tests in the devguide.

  1. How am I supposed to configure my linter in VS Code? Autopep8 & isort are way too strict and I had to turn them off. It seems like only a trailing newline is required.

I have no idea; I don't use VS Code :) I think you are better off asking that question in a VS Code forum.

@eryksun
Copy link
Contributor

eryksun commented Apr 10, 2024

On Windows, the os.path functions exists(), isfile(), isdir(), and islink() are implemented in "Modules/posixmodule.c" as the corresponding os__path_*_impl() builtin functions. We never got around to implementing isjunction() as a builtin. I don't recall why lexists() wasn't implemented as a builtin.

In the context of stat queries, a name-surrogate reparse point is considered a 'symlink'. The GetFileInformationByName() fast path will have to be modified to use the result when not following symlinks if it's a name-surrogate reparse point. The slow path will have to call STAT() when not following symlinks, or win32_xstat_slow_impl() could be refactored into reusable helper functions that encapsulate the design decisions and logic.

@nineteendo
Copy link
Contributor Author

Is this something you would be willing to make a pull request for?

@nineteendo
Copy link
Contributor Author

nineteendo commented Apr 10, 2024

Alternatively, we could make ntpath.exists() a wrapper for nt._path_exists() & genericpath.lexists(). Then we wouldn't need to change the C code.

@eryksun
Copy link
Contributor

eryksun commented Apr 10, 2024

That works, if you're averse to programming in C, assuming the wrapper doesn't significantly hurt the performance of exists(). Otherwise, I'd rather have a fast path for lexists(). It's currently a slower path based on a full os.lstat(). If _path_exists() supports follow_symlinks, we can trivially implement builtin _path_lexists() via os__path_exists_impl().

@nineteendo
Copy link
Contributor Author

There's around 1% hit on performance. We have better things to do than optimise that using C code.
Let's leave implementing _path_lexists() for another time.

@eryksun
Copy link
Contributor

eryksun commented Apr 11, 2024

FWIW, here's code to extend the implementation of nt._path_exists() in "Modules/posixmodule.c" to support follow_symlinks and to implement nt._path_lexists(). You have to manually add OS__PATH_LEXISTS_METHODDEF to the posix_methods array. Then run Tools/clinic/clinic.py Modules/posixmodule.c, and rebuild Python.

/*[clinic input]
os._path_exists

    path: 'O'
        Path to be tested; can be a string, bytes object, a path-like object,
        or an integer that references an open file descriptor.

    follow_symlinks: bool = True
        If False, and the last element of the path is a symbolic link, do not
        test the target of the symbolic link.

Test whether a path exists.  Returns False for broken symbolic links

[clinic start generated code]*/

static PyObject *
os__path_exists_impl(PyObject *module, PyObject *path, int follow_symlinks)
/*[clinic end generated code: output=f31130f636d45dcc input=26779f82ae9d4f5a]*/
{
    path_t _path = PATH_T_INITIALIZE("exists", "path", 0, 1);
    HANDLE hfile;
    int result = 0;

    if (!path_converter(path, &_path)) {
        path_cleanup(&_path);
        if (PyErr_ExceptionMatches(PyExc_ValueError)) {
            PyErr_Clear();
            Py_RETURN_FALSE;
        }
        return NULL;
    }

    Py_BEGIN_ALLOW_THREADS
    if (_path.fd != -1) {
        hfile = _Py_get_osfhandle_noraise(_path.fd);
        if (hfile != INVALID_HANDLE_VALUE) {
            result = 1;
        }
    }
    else if (_path.wide) {
        BOOL slow_path = TRUE;
        FILE_STAT_BASIC_INFORMATION statInfo;
        if (_Py_GetFileInformationByName(_path.wide, FileStatBasicByNameInfo,
                                         &statInfo, sizeof(statInfo)))
        {
            if (!(statInfo.FileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) ||
                !follow_symlinks &&
                IsReparseTagNameSurrogate(statInfo.ReparseTag))
            {
                slow_path = FALSE;
                result = 1;
            }
        } else if (_Py_GetFileInformationByName_ErrorIsTrustworthy(
                        GetLastError()))
        {
            slow_path = FALSE;
        }
        if (slow_path) {
            STRUCT_STAT st;
            if (!follow_symlinks) {
                if (!LSTAT(_path.wide, &st)) {
                    result = 1;
                }
            }
            else {
                hfile = CreateFileW(_path.wide, FILE_READ_ATTRIBUTES, 0, NULL,
                                    OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS,
                                    NULL);
                if (hfile != INVALID_HANDLE_VALUE) {
                    CloseHandle(hfile);
                    result = 1;
                }
                else {
                    switch (GetLastError()) {
                    case ERROR_ACCESS_DENIED:
                    case ERROR_SHARING_VIOLATION:
                    case ERROR_CANT_ACCESS_FILE:
                    case ERROR_INVALID_PARAMETER:
                        if (!STAT(_path.wide, &st)) {
                            result = 1;
                        }
                    }
                }
            }
        }
    }
    Py_END_ALLOW_THREADS

    path_cleanup(&_path);
    if (result) {
        Py_RETURN_TRUE;
    }
    Py_RETURN_FALSE;
}


/*[clinic input]
os._path_lexists

    path: 'O'
        Path to be tested; can be a string, bytes object, a path-like object,
        or an integer that references an open file descriptor.

Test whether a path exists.  Returns True for broken symbolic links

[clinic start generated code]*/

static PyObject *
os__path_lexists_impl(PyObject *module, PyObject *path)
/*[clinic end generated code: output=b9a42a50b1df6651 input=96bc06e253ad0cd7]*/
{
    return os__path_exists_impl(module, path, 0);
}

@nineteendo
Copy link
Contributor Author

Thanks, that seems to work. Is there a reason you don't make pull requests?

@serhiy-storchaka
Copy link
Member

There is an old design rule -- rather than adding a boolean parameter to a function, it is better to add a new function. It does not always work, especially if there are several boolean parameters, but Python usually follows it. os.path.exists() and os.path.lexists() are simple functions, and there is no need to complicate os.path.exists() to make os.path.lexists() from it.

@nineteendo
Copy link
Contributor Author

nineteendo commented Apr 12, 2024

Then what about pathlib.Path.exists(), os.stat() & os.chmod()? I don't see how this is any different from these functions.
If anything, it's inconsistent.

@erlend-aasland
Copy link
Contributor

I don't see how this is any different from these functions.

pathlib is a high-level API. os.path is a low-level API.

@nineteendo
Copy link
Contributor Author

nineteendo commented Apr 12, 2024

And os.stat() & os.chmod() then? Those are a low-level API and yet they also have two ways of doing it.

@serhiy-storchaka
Copy link
Member

The follow_symlionks parameter was added to os.stat() and other os functions when many other optional parameters were added. This is the case when this rule is not applicable.

I do not see the need for this change. Nobody asked for it. It seems that the only reason for this change is that you can implement it.

@nineteendo
Copy link
Contributor Author

Actually, the correct answer is that this was used for refactoring (otherwise just having lstat() would've been fine): 9cf065c.
But that's not applicable here.

@nineteendo nineteendo closed this as not planned Won't fix, can't repro, duplicate, stale Apr 12, 2024
@eryksun
Copy link
Contributor

eryksun commented Apr 12, 2024

I'm happy to help if you want to use the code that I posted to make a pull request to implement _path_lexists() to accelerate os.path.lexists() and clean up the implementation of _path_exists(), which is currently a bit convoluted. The implementation in "Modules/posixmodule.c" would just move into a common _path_exists() function that's called by os__path_exists_impl() and os__path_lexists_impl(), without exposing a new follow_symlinks parameter for os.path.exists().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants