Skip to content

Use SHGetKnownFolderPath() instead of SHGetFolderPathW() #350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Jayman2000
Copy link

See the commit messages for details.

Closes #348.

Before this change, `windows.py` had multiple different `import ctypes`
and `import winreg` lines. `windows.py` did this for a specific reason.
The `get_win_folder_via_ctypes()` function depends on the `ctypes`
module, and the `get_win_folder_from_registry()` functions depends on
the `winreg` modules, but `windows.py` as a whole is supposed to
continue to work even if `ctypes` and `winreg` are not available.

Before this change, `windows.py` would try to avoid errors by only
calling either of those two functions if the module that that function
requires is not available. This meant that there had to be two
`import ctypes` lines and two `import winreg` lines. The first one would
be inside the body of the function that that depends on the module, and
the second one would be in a test that determines if the function should
be called. The idea was to only call `get_win_folder_via_ctypes()` and
`get_win_folder_from_registry()` if their respective modules are
available.

This change simplifies `windows.py`. Previously, those two functions
were always defined but not always called. Now, those two fuctions are
not always defined and not always called. This means that there’s now
only one `import ctypes` line and only one `import winreg` line. As an
added bonus, those two `import` statements are now at the top of the
file (like the rest of the `import` statements) and the
`_pick_get_win_folder()` function is now simpler.

The main motivation behind this change is to make it easier to create a
future change. That future change will add additional code to
`windows.py` that depends on the `ctypes` module. This change will allow
me to add that additional code without adding an additional
`import ctypes` line.
@Jayman2000
Copy link
Author

I just pushed a new version of this PR that fixes the type checker errors. It looks like I had forgotten a null pointer check.

Copy link
Member

@gaborbernat gaborbernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the macos tests failed.


if "ctypes" in globals() and hasattr(ctypes, "windll"):

class GUID(ctypes.Structure):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid defining this at import time and postpone the logic until necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, like, by defining the GUID class inside of the get_win_folder_via_ctypes function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't think we need this class as I mentioned elsewhere the checks aren't necessary nor do we require a __repr__ that would be different than the string UUID. We can just put the necessary logic where it is used. I believe it's just used once but if not then you can create a helper function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my research, I don’t think that it’s possible to make this pull request work without having a GUID class. When I did my research, the first thing that I had to look at was Windows’s SHGetKnownFolderPath() function.

Sidenote

If you want to take a look at these header files yourself, then you need to install the Windows SDK. Here’s how I made sure that the Windows SDK was installed:

  1. Make sure that Visual Studio Build Tools 2022 is installed by running this command in an administrator Command Prompt:

    winget install --exact --id=Microsoft.VisualStudio.2022.BuildTools
  2. Make sure that the “Desktop development with C++” workload is installed by running this command in an administrator Command Prompt:

    "%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\setup.exe" modify^
      --passive^
      --channelId VisualStudio.17.Release^
      --productId Microsoft.VisualStudio.Product.BuildTools^
      --add Microsoft.VisualStudio.Workload.VCTools;includeRecommended

The prototype for the SHGetKnownFolderPath() function is in the %ProgramFiles(x86)%\Windows Kits\10\Include\10.0.22621.0\um\ShlObj_core.h header file:

STDAPI SHGetKnownFolderPath(_In_ REFKNOWNFOLDERID rfid,
                            _In_ DWORD /* KNOWN_FOLDER_FLAG */ dwFlags,
                            _In_opt_ HANDLE hToken,
                            _Outptr_ PWSTR *ppszPath); // free *ppszPath with CoTaskMemFree

The rfid parameter’s type is REFKNOWNFOLDERID. The REFKNOWNFOLDERID type is defined in the %ProgramFiles(x86)%\Windows Kits\10\Include\10.0.22621.0\um\shtypes.h header file:

#ifdef __cplusplus
#define REFKNOWNFOLDERID const KNOWNFOLDERID &
#else // !__cplusplus
#define REFKNOWNFOLDERID const KNOWNFOLDERID * __MIDL_CONST
#endif // __cplusplus

This means that REFKNOWNFOLDERID is going to either be a reference or pointer to a KNOWNFOLDERID value (as long as it’s not a null pointer). The KNOWNFOLDERID type is defined in the %ProgramFiles(x86)%\Windows Kits\10\Include\10.0.22621.0\um\shtypes.h header file:

typedef GUID KNOWNFOLDERID;

This means that a KNOWNFOLDERID value is the same thing as a GUID value. The GUID type is defined in the %ProgramFiles(x86)%\Windows Kits\10\Include\10.0.22621.0\shared\guiddef.h header file:

#if defined(__midl)
typedef struct {
    unsigned long  Data1;
    unsigned short Data2;
    unsigned short Data3;
    byte           Data4[ 8 ];
} GUID;
#else
typedef struct _GUID {
    unsigned long  Data1;
    unsigned short Data2;
    unsigned short Data3;
    unsigned char  Data4[ 8 ];
} GUID;
#endif

This means that GUID is a type of structure. So, in order to call the SHGetKnownFolderPath() function, we need a REFKNOWNFOLDERID value which is actually a reference or pointer to a structure value. Here’s what Python’s ctypes module’s documentation has to say about structures:

Structures and unions must derive from the Structure and Union base classes which are defined in the ctypes module.

The ctypes module’s documentation also says:

class ctypes.Structure(*args, **kw)

Abstract base class for structures in native byte order.

Concrete structure and union types must be created by subclassing one of these types, and at least define a _fields_ class variable.


Based on that research, it seems like I need to create a structure value in order to call the SHGetKnownFolderPath() function, and it seems like I need to define a class before I can create a structure value. That being said, I’m definitely not an expert in this area (this is the first time I’ve ever used the ctypes module, and this is the second time that I’ve ever written code that uses something from the Windows API directly) so there’s a good chance that I’m missing something. Do you know of any ways that I could call the SHGetKnownFolderPath() function without defining a class?

Copy link
Collaborator

@ofek ofek Apr 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done extensive testing but the following seems to work. Save it somewhere then try running python script.py:

import ctypes

def guid_str_to_bytes(guid_str):
    parts = guid_str.split('-')
    data1 = int(parts[0], 16).to_bytes(4, 'little')
    data2 = int(parts[1], 16).to_bytes(2, 'little')
    data3 = int(parts[2], 16).to_bytes(2, 'little')
    data4 = bytes.fromhex(parts[3] + parts[4])
    return data1 + data2 + data3 + data4

def get_win_folder_via_ctypes(folderid_name: str) -> str:
    GUID_STRINGS = {
        "FOLDERID_RoamingAppData": "3EB685DB-65F9-4CF6-A03A-E3EF65729F3D",
        "FOLDERID_ProgramData": "62AB5D82-FDC1-4DC3-A9DD-070D1D495D97",
        "FOLDERID_LocalAppData": "F1B32785-6FBA-4FCF-9D55-7B8E7F157091",
        "FOLDERID_Documents": "FDD39AD0-238F-46AF-ADB4-6C85480369C7",
        "FOLDERID_Pictures": "33E28130-4E1E-4676-835A-98395C3BC3BB",
        "FOLDERID_Videos": "18989B1D-99B5-455B-841C-AB7C74E4DDFC",
        "FOLDERID_Music": "4BD8D571-6D19-48D3-BE97-422220080E43",
        "FOLDERID_Downloads": "374DE290-123F-4565-9164-39C4925E467B",
        "FOLDERID_Desktop": "B4BFCC3A-DB2C-424C-B029-7FE99A87C641",
    }
    if folderid_name not in GUID_STRINGS:
        raise ValueError(f"Unknown FOLDERID name: {folderid_name}")
    guid_str = GUID_STRINGS[folderid_name]
    folderid_bytes = guid_str_to_bytes(guid_str)
    folderid_array = (ctypes.c_char * 16).from_buffer_copy(folderid_bytes)
    kf_flag_default = 0
    s_ok = 0
    pointer_to_pointer_to_wchars = ctypes.pointer(ctypes.c_wchar_p())
    windll = ctypes.windll
    error_code = windll.shell32.SHGetKnownFolderPath(
        folderid_array,
        kf_flag_default,
        None,
        pointer_to_pointer_to_wchars
    )
    return_value = pointer_to_pointer_to_wchars.contents.value
    windll.ole32.CoTaskMemFree(pointer_to_pointer_to_wchars.contents)
    del pointer_to_pointer_to_wchars
    if error_code != s_ok:
        raise RuntimeError(f"SHGetKnownFolderPath() failed with this error code: 0x{error_code:08X}")
    if return_value is None:
        raise RuntimeError("SHGetKnownFolderPath() succeeded, but it gave us a null pointer. This should never happen.")
    if any(ord(c) > 255 for c in return_value):
        buf = ctypes.create_unicode_buffer(len(return_value))
        if windll.kernel32.GetShortPathNameW(return_value, buf, len(buf)):
            return_value = buf.value
    return return_value

if __name__ == "__main__":
    print(get_win_folder_via_ctypes("FOLDERID_Downloads"))

I asked Gemini 2.5 Pro (experimental), Grok 3 + thinking, ChatGPT o4-mini-high and Claude 3.7 + thinking. This was from Grok which in my opinion had the best answer. Claude had something very similar but stored the GUID strings as precomputed byte strings which I think might be an overoptimization for the trade-off of reduced readability (although in my own code I would do that actually).

Edit: If you want to see this is the relevant part of what Claude had. Note that if you want to do this don't copy exactly because this is still a computation, so save as raw bytes b"...":

folderid_guids = {
    "FOLDERID_RoamingAppData": bytes([0xDB, 0x85, 0xB6, 0x3E, 0xF9, 0x65, 0xF6, 0x4C, 0xA0, 0x3A, 0xE3, 0xEF, 0x65, 0x72, 0x9F, 0x3D]),
    "FOLDERID_ProgramData": bytes([0x82, 0x5D, 0xAB, 0x62, 0xC1, 0xFD, 0xC3, 0x4D, 0xA9, 0xDD, 0x07, 0x0D, 0x1D, 0x49, 0x5D, 0x97]),
    "FOLDERID_LocalAppData": bytes([0x85, 0x27, 0xB3, 0xF1, 0xBA, 0x6F, 0xCF, 0x4F, 0x9D, 0x55, 0x7B, 0x8E, 0x7F, 0x15, 0x70, 0x91]),
    "FOLDERID_Documents": bytes([0xD0, 0x9A, 0xD3, 0xFD, 0x8F, 0x23, 0xAF, 0x46, 0xAD, 0xB4, 0x6C, 0x85, 0x48, 0x03, 0x69, 0xC7]),
    "FOLDERID_Pictures": bytes([0x30, 0x81, 0xE2, 0x33, 0x1E, 0x4E, 0x76, 0x46, 0x83, 0x5A, 0x98, 0x39, 0x5C, 0x3B, 0xC3, 0xBB]),
    "FOLDERID_Videos": bytes([0x1D, 0x9B, 0x98, 0x18, 0xB5, 0x99, 0x5B, 0x45, 0x84, 0x1C, 0xAB, 0x7C, 0x74, 0xE4, 0xDD, 0xFC]),
    "FOLDERID_Music": bytes([0x71, 0xD5, 0xD8, 0x4B, 0x19, 0x6D, 0xD3, 0x48, 0xBE, 0x97, 0x42, 0x22, 0x20, 0x08, 0x0E, 0x43]),
    "FOLDERID_Downloads": bytes([0x90, 0xE2, 0x4D, 0x37, 0x3F, 0x12, 0x65, 0x45, 0x91, 0x64, 0x39, 0xC4, 0x92, 0x5E, 0x46, 0x7B]),
    "FOLDERID_Desktop": bytes([0x3A, 0xCC, 0xBF, 0xB4, 0x2C, 0xDB, 0x4C, 0x42, 0xB0, 0x29, 0x7F, 0xE9, 0x9A, 0x87, 0xC6, 0x41]),
}

if folderid_name not in folderid_guids:
    msg = f"Unknown FOLDERID name: {folderid_name}"
    raise ValueError(msg)

# Create GUID buffer
guid_buffer = ctypes.create_string_buffer(folderid_guids[folderid_name])

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m a little bit skeptical of those solutions. According to the relevant header files, SHGetKnownFolderPath()’s rfid parameter is supposed to be a pointer to a struct. Those two possible solutions don’t set rfid to a pointer to a struct. Instead, those two examples set rfid to a pointer to a char array. Purposefully using the wrong datatype seems like the wrong thing to do here, even if the code still works fine. It especially seems like the wrong option when you consider the fact that this project explicitly opts into static type checking via mypy.

Do you think that it would be better to set rfid to a pointer to a char array here, even though that’s the wrong datatype?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a struct. It's one manually created, but it is a struct. You could do exactly the same thing with the struct module, though I assume that would be more expensive than building it manually like this. If you did that, it would end up as a sequence of bytes as well. That's all structs are.

I wouldn't worry about static type checkers. Externally, it doesn't matter - users don't care about what the internals do when type checking. For internals, type checkers can't really do much with CTypes, as it is dynamic.

A few things that do matter, though: I noticed "in native byte order" in the ctypes.Structure docs, but the solution above does little endian (the struct module supports either). Also, there are padding rules, which I think are probably fine here, due to the way the data is organized, but something to keep in mind when making a struct manually (the struct module supports this too).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have actual measurements of the costs of these methods?

The official documentation for `SHGetFolderPathW()` says:

> Note  As of Windows Vista, this function is merely a wrapper for
> SHGetKnownFolderPath. The CSIDL value is translated to its associated
> KNOWNFOLDERID and then SHGetKnownFolderPath is called. New
> applications should use the known folder system rather than the older
> CSIDL system, which is supported only for backward compatibility.

Source:
<https://learn.microsoft.com/en-us/windows/win32/api/shlobj_core/nf-shlobj_core-shgetfolderpathw>

This change makes it so that platformdirs uses `SHGetKnownFolderPath()`
instead of `SHGetFolderPathW()`. This change also removes references to
the old CSIDL system and replaces them with references wo the FOLDERID
system.

Closes tox-dev#348.
@Jayman2000
Copy link
Author

I just pushed a new version of this pull request that incorporates some of the suggestions.


@gaborbernat

Seems the macos tests failed.

What would be an appropriate way to fix the fact the the coverage percentage on macOS is too low? Should I try to make it so that some of the Windows-specific tests get run on macOS?

Copy link
Collaborator

@ofek ofek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate you taking the time to do this!

Comment on lines +15 to +22
try: # noqa: SIM105
import ctypes
except ImportError:
pass
try: # noqa: SIM105
import winreg
except ImportError:
pass
Copy link
Collaborator

@ofek ofek Apr 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon taking a closer look, let's do what the rest of the file does and lazily import when necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the file does not do lazy importing at the moment because this pull request removes all lazy importing from windows.py. Are you saying that I should drop the “Simplify logic for choosing get_win_folder’s value” commit?

Comment on lines +247 to +257
expected_digit_groups = 5
if len(digit_groups) != expected_digit_groups:
msg = f"The guid_string, {guid_string!r}, does not contain {expected_digit_groups} groups of digits."
raise ValueError(msg)
for digit_group, expected_length in zip(digit_groups, (8, 4, 4, 4, 12)):
if len(digit_group) != expected_length:
msg = (
f"The digit group, {digit_group!r}, in the guid_string, {guid_string!r}, was the wrong length. "
f"It should have been {expected_length} digits long."
)
raise ValueError(msg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I appreciate the thoroughness of the edge case handling, these checks are superfluous because we control the inputs. Let's remove these for better performance.

Comment on lines +267 to +273
def __repr__(self) -> str:
guid_string = f"{self.Data1:08X}-{self.Data2:04X}-{self.Data3:04X}-"
for i in range(len(self.Data4)):
guid_string += f"{self.Data4[i]:02X}"
if i == 1:
guid_string += "-"
return f"{type(self).__qualname__}({guid_string!r})"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would only be used during debugging so I don't see the use (we should remove this class as I'm about to mention).


if "ctypes" in globals() and hasattr(ctypes, "windll"):

class GUID(ctypes.Structure):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't think we need this class as I mentioned elsewhere the checks aren't necessary nor do we require a __repr__ that would be different than the string UUID. We can just put the necessary logic where it is used. I believe it's just used once but if not then you can create a helper function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SHGetFolderPathW is deprecated
4 participants