Skip to content

Commit

Permalink
feat: unicode API rework (#923)
Browse files Browse the repository at this point in the history
Fixes #845 as discussed.

Comparing the two approaches of getting `argv`:
1. The "old" way, through `CLI::argv()`:
    ✔️ Works automatically and almost everywhere
    ❌ Small abstraction overhead on macOS
    ❌ Does not work in weird edge-cases such as missing `/proc`
2. This PR, through `app.ensure_utf8`:
✔️ True zero-overhead abstraction: you don't pay for what you don't use
✔️ Less moving parts than the "old" approach, probably can't be broken
❌ Requires extra code to be written by the user (which is sad because
ideally everyone should use this by default)

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
bindreams and pre-commit-ci[bot] committed Sep 20, 2023
1 parent 826415f commit 0d4e191
Show file tree
Hide file tree
Showing 10 changed files with 218 additions and 56 deletions.
71 changes: 39 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,25 +195,27 @@ To set up, add options, and run, your main function will look something like
this:

```cpp
int main() {
int main(int argc, char** argv) {
CLI::App app{"App description"};
argv = app.ensure_utf8(argv);

std::string filename = "default";
app.add_option("-f,--file", filename, "A help string");

CLI11_PARSE(app);
CLI11_PARSE(app, argc, argv);
return 0;
}
```
The 🚧 `CLI11_PARSE(app)` is only available in main currently and not in a
release.
For more information about 🚧`ensure_utf8` the section on
[Unicode support](#unicode-support) below. The 🚧`ensure_utf8` function is only
available in main currently and not in a release.
<details><summary>Note: If you don't like macros, this is what that macro expands to: (click to expand)</summary><p>
```cpp
try {
app.parse();
app.parse(argc, argv);
} catch (const CLI::ParseError &e) {
return app.exit(e);
}
Expand All @@ -226,26 +228,6 @@ inside the catch block; for example, help flags intentionally short-circuit all
other processing for speed and to ensure required options and the like do not
interfere.

</p></details>

<details><summary>Note: Why are argc and argv not used? (click to expand)</summary><p>

`argc` and `argv` may contain incorrect information on Windows when unicode text
is passed in. Check out a section on [unicode support](#unicode-support)🚧
below.

If this is not a concern, you can explicitly pass `argc` and `argv` from main or
from an external preprocessor of CLI arguments to `parse`:

```cpp
int main(int argc, char** argv) {
// ...

CLI11_PARSE(app, argc, argv);
return 0;
}
```
</p></details>
</br>

Expand Down Expand Up @@ -1425,13 +1407,33 @@ CLI11 supports Unicode and wide strings as defined in the
When using the command line on Windows with unicode arguments, your `main`
function may already receive broken Unicode. Parsing `argv` at that point will
not give you a correct string. To fix this, you have three options:
not give you a correct string. To fix this, you have three good options and two
bad ones:
1. Replace `argv` with `app.ensure_utf8(argv)` before any arguments are parsed.
`ensure_utf8` will do nothing on systems where `argv` is already in UTF-8
(Such as Linux or macOS) and return `argv` unmodified. On Windows, it will
discard `argv` and replace it with a correctly decoded array or arguments
from win32 API.
```cpp
int main(int argc, char** argv) {
CLI::App app;
argv = app.ensure_utf8(argv); // new argv memory is held by app
// ...
CLI11_PARSE(app, argc, argv);
}
```

1. If you pass unmodified command-line arguments to CLI11, call `app.parse()`
2. If you pass unmodified command-line arguments to CLI11, call `app.parse()`
instead of `app.parse(argc, argv)` (or `CLI11_PARSE(app)` instead of
`CLI11_PARSE(app, argc, argv)`). The library will find correct arguments
`CLI11_PARSE(app, argc, argv)`). The library will find correct arguments by
itself.

Note: this approach may not work on weird OS configurations, such as when the
`/proc` dir is missing on Linux systems (see also
[#845](https://github.com/CLIUtils/CLI11/issues/845)).

```cpp
int main() {
CLI::App app;
Expand All @@ -1440,8 +1442,8 @@ not give you a correct string. To fix this, you have three options:
}
```
2. Get correct arguments with which the program was originally executed using
provided functions: `CLI::argc()` and `CLI::argv()`. These two methods are
3. Get correct arguments with which the program was originally executed using
provided functions: `CLI::argc()` and `CLI::argv()`. These three methods are
the only cross-platform ways of handling unicode correctly.
```cpp
Expand All @@ -1452,7 +1454,9 @@ not give you a correct string. To fix this, you have three options:
}
```

3. Use the Windows-only non-standard `wmain` function, which accepts
<details><summary>Bad options (click to expand)</summary><p>

4. Use the Windows-only non-standard `wmain` function, which accepts
`wchar_t *argv[]` instead of `char* argv[]`. Parsing this will allow CLI to
convert wide strings to UTF-8 without losing information.

Expand All @@ -1464,11 +1468,14 @@ not give you a correct string. To fix this, you have three options:
}
```
4. Retrieve arguments yourself by using Windows APIs like
5. Retrieve arguments yourself by using Windows APIs like
[`CommandLineToArgvW`](https://learn.microsoft.com/en-us/windows/win32/api/shellapi/nf-shellapi-commandlinetoargvw)
and pass them to CLI. This is what the library is doing under the hood in
`CLI::argv()`.
</p></details>
</br>
The library provides functions to convert between UTF-8 and wide strings:
```cpp
Expand Down
11 changes: 11 additions & 0 deletions include/CLI/App.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,14 @@ class App {

///@}

#ifdef _WIN32
/// When normalizing argv to UTF-8 on Windows, this is the storage for normalized args.
std::vector<std::string> normalized_argv_{};

/// When normalizing argv to UTF-8 on Windows, this is the `char**` value returned to the user.
std::vector<char *> normalized_argv_view_{};
#endif

/// Special private constructor for subcommand
App(std::string app_description, std::string app_name, App *parent);

Expand All @@ -309,6 +317,9 @@ class App {
/// virtual destructor
virtual ~App() = default;

/// Convert the contents of argv to UTF-8. Only does something on Windows, does nothing elsewhere.
CLI11_NODISCARD char **ensure_utf8(char **argv);

/// Set a callback for execution when all parsing and processing has completed
///
/// Due to a bug in c++11,
Expand Down
11 changes: 11 additions & 0 deletions include/CLI/Argv.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,21 @@

#pragma once

// [CLI11:public_includes:set]
#include <string>
#include <vector>
// [CLI11:public_includes:end]

#include <CLI/Macros.hpp>

namespace CLI {
// [CLI11:argv_hpp:verbatim]
namespace detail {
#ifdef _WIN32
/// Decode and return UTF-8 argv from GetCommandLineW.
CLI11_INLINE std::vector<std::string> compute_win32_argv();
#endif
} // namespace detail

/// argc as passed in to this executable.
CLI11_INLINE int argc();
Expand Down
22 changes: 22 additions & 0 deletions include/CLI/impl/App_inl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,28 @@ CLI11_INLINE App::App(std::string app_description, std::string app_name, App *pa
}
}

CLI11_NODISCARD CLI11_INLINE char **App::ensure_utf8(char **argv) {
#ifdef _WIN32
(void)argv;

normalized_argv_ = detail::compute_win32_argv();

if(!normalized_argv_view_.empty()) {
normalized_argv_view_.clear();
}

normalized_argv_view_.reserve(normalized_argv_.size());
for(auto &arg : normalized_argv_) {
// using const_cast is well-defined, string is known to not be const.
normalized_argv_view_.push_back(const_cast<char *>(arg.data()));
}

return normalized_argv_view_.data();
#else
return argv;
#endif
}

CLI11_INLINE App *App::name(std::string app_name) {

if(parent_ != nullptr) {
Expand Down
46 changes: 24 additions & 22 deletions include/CLI/impl/Argv_inl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -89,35 +89,37 @@ static const std::vector<const char *> static_args = [] {
}();
#endif

#ifdef _WIN32
CLI11_INLINE std::vector<std::string> compute_win32_argv() {
std::vector<std::string> result;
int argc = 0;

auto deleter = [](wchar_t **ptr) { LocalFree(ptr); };
// NOLINTBEGIN(*-avoid-c-arrays)
auto wargv = std::unique_ptr<wchar_t *[], decltype(deleter)>(CommandLineToArgvW(GetCommandLineW(), &argc), deleter);
// NOLINTEND(*-avoid-c-arrays)

if(wargv == nullptr) {
throw std::runtime_error("CommandLineToArgvW failed with code " + std::to_string(GetLastError()));
}

result.reserve(static_cast<size_t>(argc));
for(size_t i = 0; i < static_cast<size_t>(argc); ++i) {
result.push_back(narrow(wargv[i]));
}

return result;
}
#endif

/// Command-line arguments, as passed in to this executable, converted to utf-8 on Windows.
CLI11_INLINE const std::vector<const char *> &args() {
// This function uses initialization via lambdas extensively to take advantage of the thread safety of static
// variable initialization [stmt.dcl.3]

#ifdef _WIN32
static const std::vector<const char *> static_args = [] {
static const std::vector<std::string> static_args_as_strings = [] {
// On Windows, take arguments from GetCommandLineW and convert them to utf-8.
std::vector<std::string> args_as_strings;
int argc = 0;

auto deleter = [](wchar_t **ptr) { LocalFree(ptr); };
// NOLINTBEGIN(*-avoid-c-arrays)
auto wargv =
std::unique_ptr<wchar_t *[], decltype(deleter)>(CommandLineToArgvW(GetCommandLineW(), &argc), deleter);
// NOLINTEND(*-avoid-c-arrays)

if(wargv == nullptr) {
throw std::runtime_error("CommandLineToArgvW failed with code " + std::to_string(GetLastError()));
}

args_as_strings.reserve(static_cast<size_t>(argc));
for(size_t i = 0; i < static_cast<size_t>(argc); ++i) {
args_as_strings.push_back(narrow(wargv[i]));
}

return args_as_strings;
}();
static const std::vector<std::string> static_args_as_strings = compute_win32_argv();

std::vector<const char *> static_args_result;
static_args_result.reserve(static_args_as_strings.size());
Expand Down
36 changes: 36 additions & 0 deletions tests/AppTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2615,3 +2615,39 @@ TEST_CASE("System Args", "[app]") {
FAIL("Executable '" << commandline << "' failed with an unknown return code");
}
}

// #845
TEST_CASE("Ensure UTF-8", "[app]") {
const char *commandline = CLI11_ENSURE_UTF8_EXE " 1234 false \"hello world\"";
int retval = std::system(commandline);

if(retval == -1) {
FAIL("Executable '" << commandline << "' reported that argv pointer changed where it should not have been");
}

if(retval > 0) {
FAIL("Executable '" << commandline << "' reported different argv at index " << (retval - 1));
}

if(retval != 0) {
FAIL("Executable '" << commandline << "' failed with an unknown return code");
}
}

// #845
TEST_CASE("Ensure UTF-8 called twice", "[app]") {
const char *commandline = CLI11_ENSURE_UTF8_TWICE_EXE " 1234 false \"hello world\"";
int retval = std::system(commandline);

if(retval == -1) {
FAIL("Executable '" << commandline << "' reported that argv pointer changed where it should not have been");
}

if(retval > 0) {
FAIL("Executable '" << commandline << "' reported different argv at index " << (retval - 1));
}

if(retval != 0) {
FAIL("Executable '" << commandline << "' failed with an unknown return code");
}
}
2 changes: 1 addition & 1 deletion tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ endforeach()
add_custom_target(cli11_test_data DEPENDS ${DATA_FILES})

# Build dependent applications which are launched from test code
set(CLI11_DEPENDENT_APPLICATIONS system_args)
set(CLI11_DEPENDENT_APPLICATIONS system_args ensure_utf8 ensure_utf8_twice)

foreach(APP IN LISTS CLI11_DEPENDENT_APPLICATIONS)
add_executable(${APP} applications/${APP}.cpp)
Expand Down
35 changes: 35 additions & 0 deletions tests/applications/ensure_utf8.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
// Copyright (c) 2017-2023, University of Cincinnati, developed by Henry Schreiner
// under NSF AWARD 1414736 and by the respective contributors.
// All rights reserved.
//
// SPDX-License-Identifier: BSD-3-Clause

#include <CLI/CLI.hpp>
#include <cstring>
#include <iostream>

int main(int argc, char **argv) {
CLI::App app{"App description"};
char **original_argv = argv;
argv = app.ensure_utf8(argv);

#ifdef _WIN32
for(int i = 0; i < argc; i++) {
if(std::strcmp(argv[i], original_argv[i]) != 0) {
std::cerr << argv[i] << "\n";
std::cerr << original_argv[i] << "\n";
return i + 1;
}
argv[i][0] = 'x'; // access it to check that it is accessible
}

#else
(void)argc;

if(original_argv != argv) {
return -1;
}
#endif

return 0;
}
36 changes: 36 additions & 0 deletions tests/applications/ensure_utf8_twice.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
// Copyright (c) 2017-2023, University of Cincinnati, developed by Henry Schreiner
// under NSF AWARD 1414736 and by the respective contributors.
// All rights reserved.
//
// SPDX-License-Identifier: BSD-3-Clause

#include <CLI/CLI.hpp>
#include <cstring>
#include <iostream>

int main(int argc, char **argv) {
CLI::App app{"App description"};
char **original_argv = argv;
argv = app.ensure_utf8(argv);
argv = app.ensure_utf8(argv); // completely useless but works ok

#ifdef _WIN32
for(int i = 0; i < argc; i++) {
if(std::strcmp(argv[i], original_argv[i]) != 0) {
std::cerr << argv[i] << "\n";
std::cerr << original_argv[i] << "\n";
return i + 1;
}
argv[i][0] = 'x'; // access it to check that it is accessible
}

#else
(void)argc;

if(original_argv != argv) {
return -1;
}
#endif

return 0;
}
Loading

0 comments on commit 0d4e191

Please sign in to comment.