Fixes Unicode paths on Windows#1363
Fixes Unicode paths on Windows#1363hodoulp merged 16 commits intoAcademySoftwareFoundation:masterfrom
Conversation
|
|
4f81806 to
d36f072
Compare
hodoulp
left a comment
There was a problem hiding this comment.
Thanks @itsmattkc for this pull request to tackle the challenging file path character encoding difference between Windows and Linux platforms.
Even If I made several comments/suggestions, I think that you are on the right track.
Any changes in the library must now provide a unit test (a new one or updating an existing one) to demonstrate that the change works on all the supported platforms. Even if it could end-up to be a no-op for Linux platform, a unit test must validate it also.
Note: if the method DefaultComputeHash can be moved in platform then the methods Utf16ToUtf8() and Utf8ToUtf16 could then be completely private i.e. no more define in an header file.
433ac99 to
1658bbf
Compare
|
Thanks @itsmattkc for all the efforts to fix the file path on the Windows platform. I do not understand the rational for the latest commit. Could you explain it? |
|
Hey, yes I've made a few more changes the PR overall and am still in the process of finalizing it. I implemented all of your suggestions but then thought it might be good to support Unicode environment variables too (such as the As for the latest commit, I assume you're referring to enforcing UTF-16LE over UTF-16BE? That's just because Windows specifically uses UTF-16LE as its Unicode encoding. When I implemented a test of the UTF-8/UTF-16 conversion functions (using pre-existing literals as "control variables"), I found it would fail in some environments because some platforms (notably the CentOS 7 CI set up here) defaulted to UTF-16BE which would lead to the strings failing to compare. Since these functions are specifically intended for Windows anyway, it seemed appropriate to just enforce UTF-16LE in the conversion. |
0da7c40 to
0d480fb
Compare
|
Okay, sorry for the delay, this PR is now ready to be re-reviewed. I've squashed my commits for neatness and updated the OP to better reflect the changes made. |
|
|
hodoulp
left a comment
There was a problem hiding this comment.
Great work.
I only made comments to improve the readability of the code.
f91e774 to
cfbc71f
Compare
Unlike most other platforms, Windows' Unicode is standardized around
UTF-16, an encoding not compatible with "char *" arrays common in
C/C++. As such, to support Unicode correctly when using Win32 APIs,
strings must be converted to and from UTF-16 and the Unicode
versions of the APIs must be used over the ANSI versions.
This commit introduces the following:
- Utility funcions for converting between UTF-8 and UTF-16LE on all
platforms:
- Platform::Utf8ToUtf16
- Platform::Utf16ToUtf8
- Adds test for these conversion functions to ensure the conversion
to and from UTF-8 and UTF-16LE is correct.
- Utility wrappers for "std::ifstream" that automatically convert
to and from UTF-16 so that filenames requiring Unicode encoding
function correctly:
- Platform::CreateInputFileStream
- Platform::OpenInputFileStream
- Moves the file default compute hash function to the Platform
class (Platform::CreateFileContentHash) and switches to the
Win32 UTF-16 variant on Windows.
- Adds the "UNICODE" macro before including "Windows.h" which
ensures all functions called are the Unicode variants instead of
the default ANSI variants.
- Implicitly changes functions such as GetEnvironmentVariable and
SetEnvironmentVariable to their Unicode variants.
- Changes the following environment variable related functions to
their Win32 Unicode variants on Windows:
- environ -> _wenviron
- _putenv_s -> _wputenv_s
- Updates tests using SetEnvironmentVariable to use wide string
literals since that function has been switched to the Unicode
variant.
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
cfbc71f to
8c72711
Compare
|
Thanks for the review, I believe I have resolved all of your suggestions. Let me know if anything else should be changed. |
|
Windows projects usually define UNICODE in the build process, usually a Visual Studio project. By defining it in a header like we have here, we may force UNICODE on a Windows project that doesn't want it. As a library, I think we should respond to UNICODE, not set it. If we have something that explicitly needs the Unicode version, just call it directly, i.e. use SetWindowTextW() instead of SetWindowText(). UNICODE might need to be added to the CMake stuff somewhere. For Windows it might be on by default, but there should be a way to switch it off and everything should still work. |
|
|
|
I still think defining UNICODE in our own header is a mistake. It should be defined by the build system. It would be like defining |
|
EDIT: Never mind, I understand what you're saying. So it would make sense to define it in CMake rather than a header? |
|
I googled on that |
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
|
@itsmattkc There is a missing |
|
Concerning the Windows |
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
|
Looks like I made a mistake in the CMake files, it should compile now |
hodoulp
left a comment
There was a problem hiding this comment.
Last questions I think by approving.
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
doug-walker
left a comment
There was a problem hiding this comment.
Thanks for these improvements! I made one minor suggestion, but approving now.
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
|
@itsmattkc please, update your pull request so I can merge it. |
* Fixes Unicode paths on Windows
Unlike most other platforms, Windows' Unicode is standardized around
UTF-16, an encoding not compatible with "char *" arrays common in
C/C++. As such, to support Unicode correctly when using Win32 APIs,
strings must be converted to and from UTF-16 and the Unicode
versions of the APIs must be used over the ANSI versions.
This commit introduces the following:
- Utility funcions for converting between UTF-8 and UTF-16LE on all
platforms:
- Platform::Utf8ToUtf16
- Platform::Utf16ToUtf8
- Adds test for these conversion functions to ensure the conversion
to and from UTF-8 and UTF-16LE is correct.
- Utility wrappers for "std::ifstream" that automatically convert
to and from UTF-16 so that filenames requiring Unicode encoding
function correctly:
- Platform::CreateInputFileStream
- Platform::OpenInputFileStream
- Moves the file default compute hash function to the Platform
class (Platform::CreateFileContentHash) and switches to the
Win32 UTF-16 variant on Windows.
- Adds the "UNICODE" macro before including "Windows.h" which
ensures all functions called are the Unicode variants instead of
the default ANSI variants.
- Implicitly changes functions such as GetEnvironmentVariable and
SetEnvironmentVariable to their Unicode variants.
- Changes the following environment variable related functions to
their Win32 Unicode variants on Windows:
- environ -> _wenviron
- _putenv_s -> _wputenv_s
- Updates tests using SetEnvironmentVariable to use wide string
literals since that function has been switched to the Unicode
variant.
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
* Moved UNICODE and _UNICODE definitions to CMake
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
* handle empty strings and use const references
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
* only use wenviron when unicode is enabled
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
* add ANSI Win32 version of Getenv
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
* Add CMake option for compiling with Win32 Unicode support
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
* Throw exception if UTF functions are called on non-Windows platforms
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
* Throw OCIO Exception
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
* Don't run UTF-8/16 conversion test on non-Windows
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
* Fix CMake error
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
* minor CMake adjustment
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
* Clarified CMake option for Win32 unicode
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
* minor improvements
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
* fix macro in cpu test cmake
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
* Fixes Unicode paths on Windows
Unlike most other platforms, Windows' Unicode is standardized around
UTF-16, an encoding not compatible with "char *" arrays common in
C/C++. As such, to support Unicode correctly when using Win32 APIs,
strings must be converted to and from UTF-16 and the Unicode
versions of the APIs must be used over the ANSI versions.
This commit introduces the following:
- Utility funcions for converting between UTF-8 and UTF-16LE on all
platforms:
- Platform::Utf8ToUtf16
- Platform::Utf16ToUtf8
- Adds test for these conversion functions to ensure the conversion
to and from UTF-8 and UTF-16LE is correct.
- Utility wrappers for "std::ifstream" that automatically convert
to and from UTF-16 so that filenames requiring Unicode encoding
function correctly:
- Platform::CreateInputFileStream
- Platform::OpenInputFileStream
- Moves the file default compute hash function to the Platform
class (Platform::CreateFileContentHash) and switches to the
Win32 UTF-16 variant on Windows.
- Adds the "UNICODE" macro before including "Windows.h" which
ensures all functions called are the Unicode variants instead of
the default ANSI variants.
- Implicitly changes functions such as GetEnvironmentVariable and
SetEnvironmentVariable to their Unicode variants.
- Changes the following environment variable related functions to
their Win32 Unicode variants on Windows:
- environ -> _wenviron
- _putenv_s -> _wputenv_s
- Updates tests using SetEnvironmentVariable to use wide string
literals since that function has been switched to the Unicode
variant.
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
* Moved UNICODE and _UNICODE definitions to CMake
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
* handle empty strings and use const references
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
* only use wenviron when unicode is enabled
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
* add ANSI Win32 version of Getenv
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
* Add CMake option for compiling with Win32 Unicode support
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
* Throw exception if UTF functions are called on non-Windows platforms
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
* Throw OCIO Exception
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
* Don't run UTF-8/16 conversion test on non-Windows
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
* Fix CMake error
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
* minor CMake adjustment
Signed-off-by: itsmattkc <itsmattkc@gmail.com>
* Clarified CMake option for Win32 unicode
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
* minor improvements
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
* fix macro in cpu test cmake
Signed-off-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
Co-authored-by: itsmattkc <34096995+itsmattkc@users.noreply.github.com>
Fixes #406.
Unlike most other platforms, Windows' Unicode is standardized around UTF-16, an encoding not compatible with
char *arrays common in C/C++. As such, to support Unicode correctly when using Win32 APIs, strings must be converted to and from UTF-16 and the Unicode versions of the APIs must be used over the ANSI versions.This commit introduces the following:
Utility functions for converting between UTF-8 and UTF-16LE on all platforms:
Platform::Utf8ToUtf16Platform::Utf16ToUtf8std::codecvt_utf8_utf16on macOS and Linux butMultiByteToWideCharandWideCharToMultiByteon Windows because MSVC gave deprecation warnings recommending the Win32 functions with the former.Test for these conversion functions to ensure the conversion to and from UTF-8 and UTF-16LE is correct.
Utility wrappers for "std::ifstream" that automatically convert to and from UTF-16 on Windows so that filenames requiring Unicode encoding function correctly:
Platform::CreateInputFileStreamPlatform::OpenInputFileStreamMoves the file default compute hash function to the Platform class (
Platform::CreateFileContentHash) and switches to the Win32 UTF-16 variant on Windows.Adds the
UNICODEmacro before including "Windows.h" on Windows which ensures all functions called are the Unicode variants instead of the default ANSI variants.GetEnvironmentVariableandSetEnvironmentVariableto their Unicode variants.Changes the following environment variable related functions to their Win32 Unicode variants on Windows:
environ->_wenviron_putenv_s->_wputenv_sUpdates tests using
SetEnvironmentVariableto use wide string literals since that function has been switched to the Unicode variant.In my testing, this appears to function correctly, loading configs, LUTs, and looks with non-ASCII paths, however I have not tested exhaustively. There may be more file-related functions I haven't found/wrapped.
Happy to take any suggestions or make any changes to the implementation upon request.