-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dune-project parser does not work if dune-project has a UTF-8 byte-order mark #9396
Comments
If this is accurate, then the BOM is only the start; I don't think Dune is able to decode UTF-16 text at the moment. |
I imagine this is an issue for regular dune files as well. In any case, it seems reasonable for us to handle it. |
Okay, there are two separate problems, both exposed on Windows. The first is UTF-16 encoding is not yet supported. The second is that BOM is not supported, even in UTF-8. For example: > with-dkml file dune-project
dune-project: Unicode text, UTF-8 (with BOM) text, with no line terminators
> with-dkml od -xc dune-project
0000000 bbef 28bf 616c 676e 6420 6e75 2065 2e33
357 273 277 ( l a n g d u n e 3 .
0000020 3231 0029
1 2 )
0000023
> dune build
File "dune-project", line 1, characters 0-19:
1 | (lang dune 3.12)
^^^^^^^^^^^^^^^^^^^
Error: Invalid first line, expected: (lang <lang> <version>) |
We will need an UTF decoder. The best one is in the standard library, but only on 4.14 and later. |
Does OCaml even compile with utf16 .ml files? Most other compilers I know, Also from a cross platform perspective, if you want the dune files to work on other platforms, anything other than utf8 seems like a foot gun. I've read that powershell allows you to configure the default piping output from utf16 to utf8. Apparently that is also the default on newer versions. I'm tempted to say that echo is not the correct tool in this case, and that you should be passing PS arguments fixing the encoding. If this is for scripts, then this is probably fine, but for users, I wouldn't recommend writing Dune files this way and simply using an editor would be an improvement. In the mean time, this is a good opportunity to improve the error message and say that we only accept utf8. Also updating the docs is a good idea. |
I never even considered that. After testing UTF-16 ... no, it does not. But neither does it work with BOM-encoded UTF-8 ... which is valid UTF-8. Anyway, Dune should not need to support encodings that OCaml does not support. Sadly, I can't find a specification for what encoding OCaml supports! I vaguely remember some thread that OCaml doesn't support UTF-8 source files ... I think it was only ISO 8859-1 or some Latin variant. Flow chart:
Aside: C has no spec for source code encoding, so |
OCaml has no spec for source code encoding: source file contents are interpreted as raw bytes. There is however an ongoing discussion to require UTF-8: see ocaml/ocaml#1802 and the links mentioned there. |
Okay ... will have you all on the Dune team decide whether UTF-8 (including optional BOM) is what Dune supports. |
We'll support whatever OCaml does. I agree with your statement here:
|
Edits
Expected Behavior
In PowerShell on Windows:
I would expect the project to build.
Actual Behavior
The reason? Windows conventionally has a byte-order mark in its Unicode files. The built-in PowerShell (or 5.x or less) inserts the BOMs; newer Command Prompts do not.
Specifications
dune
(output ofdune --version
): 3.12.1The text was updated successfully, but these errors were encountered: