Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zig runtime should set console output mode to utf-8 by default on Windows #7600

Open
chr-1x opened this issue Dec 30, 2020 · 7 comments
Open
Labels
os-windows proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@chr-1x
Copy link

chr-1x commented Dec 30, 2020

const std = @import("std");

pub fn main() void {
    std.log.debug("Hello, μ!  (^=◕ᴥ◕=^)", .{});
    var f = std.io.getStdOut();
    _ = f.write("Hello, μ!  (^=◕ᴥ◕=^)\n") catch @panic("write to stdout failed");
    f = std.io.getStdErr();
    _ = f.write("Hello, μ!  (^=◕ᴥ◕=^)\n") catch @panic("write to stderr failed");
}
> repro_windows_utf8.exe
debug: Hello, μ!  (^=◕ᴥ◕=^)
Hello, μ!  (^=◕ᴥ◕=^)
Hello, μ!  (^=◕ᴥ◕=^)

Poor kitty, its face has been replaced by encoding errors!

The zig documentation has this to say about string literals:

String literals are single-item constant Pointers to null-terminated UTF-8 encoded byte arrays.
... String literals are const pointers to null-terminated arrays of u8, and by convention parameters that are "strings" are expected to be UTF-8 encoded slices of u8.

If this is true of all standard library functions, std.log and writers that refer to console output streams are expecting utf-8 encoded strings, and callers likely expect these functions to output these strings to the console as they appear in their source files or data sources. Users of zig command-line applications (and zig developers) may not have configured Windows to use utf-8 by default, but this can be overridden for a given console session by SetConsoleOutputCP.

I propose that the zig runtime should, at program startup, call SetConsoleOutputCP with an argument of 65001 to amend terminal output to correctly show utf-8 strings in all cases. If zig developers want legacy behavior or alternative code pages, they can call this function again with a different argument in their program's entry point.

(Aside: in this case, the zig compiler does not have the correct behavior by default either, and will print source code with the incorrect encoding in error messages):

.\repro_windows_utf8.zig:6:44: error: expression value is ignored
    f.write("Hello, μ!  (^=◕ᴥ◕=^)") catch @panic("write to stderr failed");
@chr-1x
Copy link
Author

chr-1x commented Dec 30, 2020

Note that if the zig runtime does not do this, every zig developer who wants to ensure that they correctly output UTF-8 encoded text on all Windows machines will have to insert os-specific logic in their startup code, which makes writing portable zig code a bit more of a headache and a bit less of a happens-by-default thing.

@daurnimator
Copy link
Contributor

I propose that the zig runtime should, at program startup, call SetConsoleOutputCP with an argument of 65001 to amend terminal output to correctly show utf-8 strings in all cases

zig can be used to create libraries (both static and dynamic) where we cannot rely on any code being called at startup.

On windows we may need to use WriteConsoleW from std.log when the output file handle is a console (check with GetConsoleMode)

@ehaas
Copy link
Contributor

ehaas commented Dec 30, 2020

It might be worth looking at the click Python project for an idea of what it takes to make the Windows console API reasonably ergonomic when dealing with unicode.

https://click.palletsprojects.com/en/7.x/wincmd/

https://github.com/pallets/click/blob/master/src/click/_winconsole.py

@LemonBoy
Copy link
Contributor

On windows we may need to use WriteConsoleW from std.log when the output file handle is a console (check with GetConsoleMode)

It seems that SetConsoleOutputCP and WriteConsoleA should do the trick, the W version requires an intermediate conversion to UCS-2.

@daurnimator
Copy link
Contributor

It seems that SetConsoleOutputCP

When would you call SetConsoleOutputCP though? we can't guarntee it on startup; and in general its not going to be threadsafe.

@tecanec
Copy link
Contributor

tecanec commented Dec 30, 2020

If this can somehow be done from the standard library, I think that that should be favored over being part of the compiler. That would be better in case the programmer doesn't want to call the functions to set output mode for whatever reason they might have. It also seems more natural that this would be part of the same system that the programmer already uses for console output. Even if it meant calling a function like debug.setConsoleOutputMode(.utf8);, I still think this is better than building this into the compiler.

@AssortedFantasy
Copy link

AssortedFantasy commented Jan 3, 2021

An upstream fix is just to change Windows to UTF-8 in Control Panel > Region > Administrative > Language for non-Unicode programs > Change System Local > Beta: Use UTF-8 For worldwide language support.

image

Some apps will break but it mostly works to make everything all peachy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
os-windows proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
7 participants