-
-
Notifications
You must be signed in to change notification settings - Fork 57
Use SHGetKnownFolderPath()
instead of SHGetFolderPathW()
#350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Before this change, `windows.py` had multiple different `import ctypes` and `import winreg` lines. `windows.py` did this for a specific reason. The `get_win_folder_via_ctypes()` function depends on the `ctypes` module, and the `get_win_folder_from_registry()` functions depends on the `winreg` modules, but `windows.py` as a whole is supposed to continue to work even if `ctypes` and `winreg` are not available. Before this change, `windows.py` would try to avoid errors by only calling either of those two functions if the module that that function requires is not available. This meant that there had to be two `import ctypes` lines and two `import winreg` lines. The first one would be inside the body of the function that that depends on the module, and the second one would be in a test that determines if the function should be called. The idea was to only call `get_win_folder_via_ctypes()` and `get_win_folder_from_registry()` if their respective modules are available. This change simplifies `windows.py`. Previously, those two functions were always defined but not always called. Now, those two fuctions are not always defined and not always called. This means that there’s now only one `import ctypes` line and only one `import winreg` line. As an added bonus, those two `import` statements are now at the top of the file (like the rest of the `import` statements) and the `_pick_get_win_folder()` function is now simpler. The main motivation behind this change is to make it easier to create a future change. That future change will add additional code to `windows.py` that depends on the `ctypes` module. This change will allow me to add that additional code without adding an additional `import ctypes` line.
b5d86ed
to
40dba9c
Compare
I just pushed a new version of this PR that fixes the type checker errors. It looks like I had forgotten a null pointer check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the macos tests failed.
|
||
if "ctypes" in globals() and hasattr(ctypes, "windll"): | ||
|
||
class GUID(ctypes.Structure): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid defining this at import time and postpone the logic until necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, like, by defining the GUID
class inside of the get_win_folder_via_ctypes
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't think we need this class as I mentioned elsewhere the checks aren't necessary nor do we require a __repr__
that would be different than the string UUID. We can just put the necessary logic where it is used. I believe it's just used once but if not then you can create a helper function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my research, I don’t think that it’s possible to make this pull request work without having a GUID
class. When I did my research, the first thing that I had to look at was Windows’s SHGetKnownFolderPath()
function.
Sidenote
If you want to take a look at these header files yourself, then you need to install the Windows SDK. Here’s how I made sure that the Windows SDK was installed:
-
Make sure that Visual Studio Build Tools 2022 is installed by running this command in an administrator Command Prompt:
winget install --exact --id=Microsoft.VisualStudio.2022.BuildTools
-
Make sure that the “Desktop development with C++” workload is installed by running this command in an administrator Command Prompt:
"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\setup.exe" modify^ --passive^ --channelId VisualStudio.17.Release^ --productId Microsoft.VisualStudio.Product.BuildTools^ --add Microsoft.VisualStudio.Workload.VCTools;includeRecommended
The prototype for the SHGetKnownFolderPath()
function is in the %ProgramFiles(x86)%\Windows Kits\10\Include\10.0.22621.0\um\ShlObj_core.h
header file:
STDAPI SHGetKnownFolderPath(_In_ REFKNOWNFOLDERID rfid,
_In_ DWORD /* KNOWN_FOLDER_FLAG */ dwFlags,
_In_opt_ HANDLE hToken,
_Outptr_ PWSTR *ppszPath); // free *ppszPath with CoTaskMemFree
The rfid
parameter’s type is REFKNOWNFOLDERID
. The REFKNOWNFOLDERID
type is defined in the %ProgramFiles(x86)%\Windows Kits\10\Include\10.0.22621.0\um\shtypes.h
header file:
#ifdef __cplusplus
#define REFKNOWNFOLDERID const KNOWNFOLDERID &
#else // !__cplusplus
#define REFKNOWNFOLDERID const KNOWNFOLDERID * __MIDL_CONST
#endif // __cplusplus
This means that REFKNOWNFOLDERID
is going to either be a reference or pointer to a KNOWNFOLDERID
value (as long as it’s not a null pointer). The KNOWNFOLDERID
type is defined in the %ProgramFiles(x86)%\Windows Kits\10\Include\10.0.22621.0\um\shtypes.h
header file:
typedef GUID KNOWNFOLDERID;
This means that a KNOWNFOLDERID
value is the same thing as a GUID
value. The GUID
type is defined in the %ProgramFiles(x86)%\Windows Kits\10\Include\10.0.22621.0\shared\guiddef.h
header file:
#if defined(__midl)
typedef struct {
unsigned long Data1;
unsigned short Data2;
unsigned short Data3;
byte Data4[ 8 ];
} GUID;
#else
typedef struct _GUID {
unsigned long Data1;
unsigned short Data2;
unsigned short Data3;
unsigned char Data4[ 8 ];
} GUID;
#endif
This means that GUID
is a type of structure. So, in order to call the SHGetKnownFolderPath()
function, we need a REFKNOWNFOLDERID
value which is actually a reference or pointer to a structure value. Here’s what Python’s ctypes
module’s documentation has to say about structures:
Structures and unions must derive from the
Structure
andUnion
base classes which are defined in thectypes
module.
The ctypes
module’s documentation also says:
class ctypes.Structure(*args, **kw)
Abstract base class for structures in native byte order.
Concrete structure and union types must be created by subclassing one of these types, and at least define a
_fields_
class variable.
Based on that research, it seems like I need to create a structure value in order to call the SHGetKnownFolderPath()
function, and it seems like I need to define a class before I can create a structure value. That being said, I’m definitely not an expert in this area (this is the first time I’ve ever used the ctypes
module, and this is the second time that I’ve ever written code that uses something from the Windows API directly) so there’s a good chance that I’m missing something. Do you know of any ways that I could call the SHGetKnownFolderPath()
function without defining a class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done extensive testing but the following seems to work. Save it somewhere then try running python script.py
:
import ctypes
def guid_str_to_bytes(guid_str):
parts = guid_str.split('-')
data1 = int(parts[0], 16).to_bytes(4, 'little')
data2 = int(parts[1], 16).to_bytes(2, 'little')
data3 = int(parts[2], 16).to_bytes(2, 'little')
data4 = bytes.fromhex(parts[3] + parts[4])
return data1 + data2 + data3 + data4
def get_win_folder_via_ctypes(folderid_name: str) -> str:
GUID_STRINGS = {
"FOLDERID_RoamingAppData": "3EB685DB-65F9-4CF6-A03A-E3EF65729F3D",
"FOLDERID_ProgramData": "62AB5D82-FDC1-4DC3-A9DD-070D1D495D97",
"FOLDERID_LocalAppData": "F1B32785-6FBA-4FCF-9D55-7B8E7F157091",
"FOLDERID_Documents": "FDD39AD0-238F-46AF-ADB4-6C85480369C7",
"FOLDERID_Pictures": "33E28130-4E1E-4676-835A-98395C3BC3BB",
"FOLDERID_Videos": "18989B1D-99B5-455B-841C-AB7C74E4DDFC",
"FOLDERID_Music": "4BD8D571-6D19-48D3-BE97-422220080E43",
"FOLDERID_Downloads": "374DE290-123F-4565-9164-39C4925E467B",
"FOLDERID_Desktop": "B4BFCC3A-DB2C-424C-B029-7FE99A87C641",
}
if folderid_name not in GUID_STRINGS:
raise ValueError(f"Unknown FOLDERID name: {folderid_name}")
guid_str = GUID_STRINGS[folderid_name]
folderid_bytes = guid_str_to_bytes(guid_str)
folderid_array = (ctypes.c_char * 16).from_buffer_copy(folderid_bytes)
kf_flag_default = 0
s_ok = 0
pointer_to_pointer_to_wchars = ctypes.pointer(ctypes.c_wchar_p())
windll = ctypes.windll
error_code = windll.shell32.SHGetKnownFolderPath(
folderid_array,
kf_flag_default,
None,
pointer_to_pointer_to_wchars
)
return_value = pointer_to_pointer_to_wchars.contents.value
windll.ole32.CoTaskMemFree(pointer_to_pointer_to_wchars.contents)
del pointer_to_pointer_to_wchars
if error_code != s_ok:
raise RuntimeError(f"SHGetKnownFolderPath() failed with this error code: 0x{error_code:08X}")
if return_value is None:
raise RuntimeError("SHGetKnownFolderPath() succeeded, but it gave us a null pointer. This should never happen.")
if any(ord(c) > 255 for c in return_value):
buf = ctypes.create_unicode_buffer(len(return_value))
if windll.kernel32.GetShortPathNameW(return_value, buf, len(buf)):
return_value = buf.value
return return_value
if __name__ == "__main__":
print(get_win_folder_via_ctypes("FOLDERID_Downloads"))
I asked Gemini 2.5 Pro (experimental), Grok 3 + thinking, ChatGPT o4-mini-high and Claude 3.7 + thinking. This was from Grok which in my opinion had the best answer. Claude had something very similar but stored the GUID strings as precomputed byte strings which I think might be an overoptimization for the trade-off of reduced readability (although in my own code I would do that actually).
Edit: If you want to see this is the relevant part of what Claude had. Note that if you want to do this don't copy exactly because this is still a computation, so save as raw bytes b"..."
:
folderid_guids = {
"FOLDERID_RoamingAppData": bytes([0xDB, 0x85, 0xB6, 0x3E, 0xF9, 0x65, 0xF6, 0x4C, 0xA0, 0x3A, 0xE3, 0xEF, 0x65, 0x72, 0x9F, 0x3D]),
"FOLDERID_ProgramData": bytes([0x82, 0x5D, 0xAB, 0x62, 0xC1, 0xFD, 0xC3, 0x4D, 0xA9, 0xDD, 0x07, 0x0D, 0x1D, 0x49, 0x5D, 0x97]),
"FOLDERID_LocalAppData": bytes([0x85, 0x27, 0xB3, 0xF1, 0xBA, 0x6F, 0xCF, 0x4F, 0x9D, 0x55, 0x7B, 0x8E, 0x7F, 0x15, 0x70, 0x91]),
"FOLDERID_Documents": bytes([0xD0, 0x9A, 0xD3, 0xFD, 0x8F, 0x23, 0xAF, 0x46, 0xAD, 0xB4, 0x6C, 0x85, 0x48, 0x03, 0x69, 0xC7]),
"FOLDERID_Pictures": bytes([0x30, 0x81, 0xE2, 0x33, 0x1E, 0x4E, 0x76, 0x46, 0x83, 0x5A, 0x98, 0x39, 0x5C, 0x3B, 0xC3, 0xBB]),
"FOLDERID_Videos": bytes([0x1D, 0x9B, 0x98, 0x18, 0xB5, 0x99, 0x5B, 0x45, 0x84, 0x1C, 0xAB, 0x7C, 0x74, 0xE4, 0xDD, 0xFC]),
"FOLDERID_Music": bytes([0x71, 0xD5, 0xD8, 0x4B, 0x19, 0x6D, 0xD3, 0x48, 0xBE, 0x97, 0x42, 0x22, 0x20, 0x08, 0x0E, 0x43]),
"FOLDERID_Downloads": bytes([0x90, 0xE2, 0x4D, 0x37, 0x3F, 0x12, 0x65, 0x45, 0x91, 0x64, 0x39, 0xC4, 0x92, 0x5E, 0x46, 0x7B]),
"FOLDERID_Desktop": bytes([0x3A, 0xCC, 0xBF, 0xB4, 0x2C, 0xDB, 0x4C, 0x42, 0xB0, 0x29, 0x7F, 0xE9, 0x9A, 0x87, 0xC6, 0x41]),
}
if folderid_name not in folderid_guids:
msg = f"Unknown FOLDERID name: {folderid_name}"
raise ValueError(msg)
# Create GUID buffer
guid_buffer = ctypes.create_string_buffer(folderid_guids[folderid_name])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m a little bit skeptical of those solutions. According to the relevant header files, SHGetKnownFolderPath()
’s rfid
parameter is supposed to be a pointer to a struct
. Those two possible solutions don’t set rfid
to a pointer to a struct
. Instead, those two examples set rfid
to a pointer to a char
array. Purposefully using the wrong datatype seems like the wrong thing to do here, even if the code still works fine. It especially seems like the wrong option when you consider the fact that this project explicitly opts into static type checking via mypy.
Do you think that it would be better to set rfid
to a pointer to a char
array here, even though that’s the wrong datatype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a struct. It's one manually created, but it is a struct. You could do exactly the same thing with the struct module, though I assume that would be more expensive than building it manually like this. If you did that, it would end up as a sequence of bytes as well. That's all structs are.
I wouldn't worry about static type checkers. Externally, it doesn't matter - users don't care about what the internals do when type checking. For internals, type checkers can't really do much with CTypes, as it is dynamic.
A few things that do matter, though: I noticed "in native byte order" in the ctypes.Structure
docs, but the solution above does little endian (the struct module supports either). Also, there are padding rules, which I think are probably fine here, due to the way the data is organized, but something to keep in mind when making a struct manually (the struct module supports this too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have actual measurements of the costs of these methods?
The official documentation for `SHGetFolderPathW()` says: > Note As of Windows Vista, this function is merely a wrapper for > SHGetKnownFolderPath. The CSIDL value is translated to its associated > KNOWNFOLDERID and then SHGetKnownFolderPath is called. New > applications should use the known folder system rather than the older > CSIDL system, which is supported only for backward compatibility. Source: <https://learn.microsoft.com/en-us/windows/win32/api/shlobj_core/nf-shlobj_core-shgetfolderpathw> This change makes it so that platformdirs uses `SHGetKnownFolderPath()` instead of `SHGetFolderPathW()`. This change also removes references to the old CSIDL system and replaces them with references wo the FOLDERID system. Closes tox-dev#348.
40dba9c
to
2d21870
Compare
I just pushed a new version of this pull request that incorporates some of the suggestions.
What would be an appropriate way to fix the fact the the coverage percentage on macOS is too low? Should I try to make it so that some of the Windows-specific tests get run on macOS? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really appreciate you taking the time to do this!
try: # noqa: SIM105 | ||
import ctypes | ||
except ImportError: | ||
pass | ||
try: # noqa: SIM105 | ||
import winreg | ||
except ImportError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon taking a closer look, let's do what the rest of the file does and lazily import when necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the file does not do lazy importing at the moment because this pull request removes all lazy importing from windows.py
. Are you saying that I should drop the “Simplify logic for choosing get_win_folder
’s value” commit?
expected_digit_groups = 5 | ||
if len(digit_groups) != expected_digit_groups: | ||
msg = f"The guid_string, {guid_string!r}, does not contain {expected_digit_groups} groups of digits." | ||
raise ValueError(msg) | ||
for digit_group, expected_length in zip(digit_groups, (8, 4, 4, 4, 12)): | ||
if len(digit_group) != expected_length: | ||
msg = ( | ||
f"The digit group, {digit_group!r}, in the guid_string, {guid_string!r}, was the wrong length. " | ||
f"It should have been {expected_length} digits long." | ||
) | ||
raise ValueError(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I appreciate the thoroughness of the edge case handling, these checks are superfluous because we control the inputs. Let's remove these for better performance.
def __repr__(self) -> str: | ||
guid_string = f"{self.Data1:08X}-{self.Data2:04X}-{self.Data3:04X}-" | ||
for i in range(len(self.Data4)): | ||
guid_string += f"{self.Data4[i]:02X}" | ||
if i == 1: | ||
guid_string += "-" | ||
return f"{type(self).__qualname__}({guid_string!r})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would only be used during debugging so I don't see the use (we should remove this class as I'm about to mention).
|
||
if "ctypes" in globals() and hasattr(ctypes, "windll"): | ||
|
||
class GUID(ctypes.Structure): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't think we need this class as I mentioned elsewhere the checks aren't necessary nor do we require a __repr__
that would be different than the string UUID. We can just put the necessary logic where it is used. I believe it's just used once but if not then you can create a helper function.
See the commit messages for details.
Closes #348.