Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArgIteratorWindows uses non-Unicode GetCommandLineA and a custom parser #2222

Closed
ljmccarthy opened this issue Apr 9, 2019 · 6 comments
Closed
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. os-windows standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@ljmccarthy
Copy link

std.os.ArgIteratorWindows currently uses GetCommandLineA which doesn't support Unicode and implements it's own command line parser.

I'm sure it's a good parser, but there is already a Windows function to implement this called CommandLineToArgvW which can be used with the result of GetCommandLineW. I would suggest that using this instead would make sure that Zig doesn't parse command-lines differently to regular C/C++ programs on Windows and would reduce code size.

I have implemented a proof-of-concept using these Win32 functions, returning an array of UTF-8 strings:

https://gist.github.com/ljmccarthy/4f372f060d2d2e11b594ad9188fd4720

I'm also not a big for of the ArgIterator interface, which is rather painful to use, but I will create a separate issue for that :-)

@ljmccarthy
Copy link
Author

Here's ArgIteratorWindows re-written using CommandLineToArgvW:

https://gist.github.com/ljmccarthy/ef3e4e828a940812ae9d23f4d3713053

Note I've had to add a deinit() function.

@ljmccarthy
Copy link
Author

ljmccarthy commented Apr 9, 2019

Maybe this isn't the best approach. UCRT also has it's own parser, and it's not clear if it works exactly like CommandLineToArgvW. It seems this is a bit of a mess on Windows with no real standard, so perhaps the existing parser is fine. If we are linking with UCRT we could use it's parsed argv exported by the __argc and __argv/__wargv symbols, but it would still require a fallback for when we don't want to link to UCRT.

@andrewrk andrewrk added this to the 0.5.0 milestone Apr 9, 2019
@andrewrk andrewrk added the standard library This issue involves writing Zig code for the standard library. label Apr 9, 2019
@andrewrk
Copy link
Member

andrewrk commented Apr 9, 2019

See also #534

@shawnl
Copy link
Contributor

shawnl commented Apr 9, 2019

I'm also not a big for of the ArgIterator interface, which is rather painful to use, but I will create a separate issue for that :-)

I agree. It also consumes bytes in every binary. I did some work on it https://github.com/shawnl/zig/tree/startup

My real plan though is to have a startup function separate from main that does argument and environment parsing, and then it will return to the startup code, which will remove the environment, and then call main. If you have a Linux box type env in a terminal. every process you spawn in that terminal has that much wasted space. (plus some more, it is somewhat complicated) By clearing this we also enforce separation of code that deals with user-supplied data.

@andrewrk andrewrk modified the milestones: 0.5.0, 0.6.0 Sep 20, 2019
@andrewrk andrewrk added os-windows contributor friendly This issue is limited in scope and/or knowledge of Zig internals. labels Jan 7, 2020
@andrewrk andrewrk modified the milestones: 0.6.0, 0.7.0 Jan 7, 2020
@cartr
Copy link
Contributor

cartr commented Jun 22, 2020

UCRT also has it's own parser, and it's not clear if it works exactly like CommandLineToArgvW.

The rules for CommandLineToArgvW are documented here, and the rules for the C main function's argv are documented here, and they look pretty similar. I investigated further by writing tests in Zig to compare ArgIteratorWindows and CommandLineToArgvW, as well as a C++ program to compare CommandLineToArgvW and C's argv.

It looks like CommandLineToArgvW has one main difference from argv. The docs say that "This function accepts command lines that contain a program name; the program name can be enclosed in quotation marks or not." But if you invoke the program with quotation marks in the middle of the program name, the argument list is parsed incorrectly:

  • "testprogram.exe" a b c correctly results in four arguments: testprogram.exe, a, b, c
  • "testprogra"m.exe a b c results in five arguments being returned: testprogra, m.exe, a, b, c
  • te"st program".exe a b c (note space in executable filename) results in two arguments: te"st and program.exe a b c
  • test"program.exe" a b c results in test"program.exe", a, b, c (the quotes aren't removed from the first argument)

Quote marks behave completely as expected if used anywhere else. It's just the program name (~=argv[0]) that's parsed oddly like this.

I'm not sure if this means Zig should avoid CommandLineToArgvW (since it doesn't work in the program-invoked-with-quotation-marks-in-the-middle-of-the-name case) or if it means Zig should start using CommandLineToArgvW (since the tests suggest it works correctly otherwise).

@squeek502
Copy link
Collaborator

This was addressed by #18309 and #19655

@squeek502 squeek502 modified the milestones: 0.16.0, 0.12.0 Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. os-windows standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

5 participants