-
-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support UTF-8 with BOM #233
Comments
Yes, I got this error:
It happens to me when i tried to modify an UTF8 toml file with Notepad on windows server. Notepad was saving as default the file with BOM and the result ist that the parser was't working anymore. |
UTF-8 shouldn't have a BOM; it looks like you're trying to read a UTF-16 file and the TOML specification supports only UTF-8. Since #276 the error on that should be clearer. |
@arp242 Unfortunately, it's somewhat common for UTF-8 encoded files on Windows to have a BOM. Byte order is of course an irrelevant concept for UTF-8. As far as I can tell, it's mostly only useful as a signal that the file is UTF-8 encoded, even though its use is nowhere near universal. (The way I've handled the UTF-8 BOM in other projects is mostly to just look for it, allow it, but otherwise ignore it.) |
Oh right, what a curious thing to do. I'll change it to ignore it then; the other UTF-16 check should still work to produce reasonable errors. |
Appearantly some UTF-8 files can start with a BOM, so read over that instead of assuming it's UTF-16. Also move the check for NULL out of the lexer, so it can remain "UTF-8 clean"; just examine the first few bytes instead. Ref: #233 (comment)
Fixed it now in #277 |
Now the lib doesn't support toml files in UTF-8 encoding with BOM
The text was updated successfully, but these errors were encountered: