Support UTF-8 with BOM #233

aegoroff · 2018-10-16T17:00:37Z

Now the lib doesn't support toml files in UTF-8 encoding with BOM

aaaasmile · 2018-11-09T09:38:46Z

Yes, I got this error:

Near line 0 (last key parsed ''): bare keys cannot contain '\ufeff'

It happens to me when i tried to modify an UTF8 toml file with Notepad on windows server. Notepad was saving as default the file with BOM and the result ist that the parser was't working anymore.

arp242 · 2021-06-08T06:15:18Z

UTF-8 shouldn't have a BOM; it looks like you're trying to read a UTF-16 file and the TOML specification supports only UTF-8. Since #276 the error on that should be clearer.

BurntSushi · 2021-06-08T10:48:57Z

@arp242 Unfortunately, it's somewhat common for UTF-8 encoded files on Windows to have a BOM. Byte order is of course an irrelevant concept for UTF-8. As far as I can tell, it's mostly only useful as a signal that the file is UTF-8 encoded, even though its use is nowhere near universal.

(The way I've handled the UTF-8 BOM in other projects is mostly to just look for it, allow it, but otherwise ignore it.)

arp242 · 2021-06-08T14:13:45Z

Oh right, what a curious thing to do.

I'll change it to ignore it then; the other UTF-16 check should still work to produce reasonable errors.

Appearantly some UTF-8 files can start with a BOM, so read over that instead of assuming it's UTF-16. Also move the check for NULL out of the lexer, so it can remain "UTF-8 clean"; just examine the first few bytes instead. Ref: #233 (comment)

arp242 · 2021-06-09T11:04:16Z

Fixed it now in #277

arp242 closed this as completed Jun 8, 2021

arp242 mentioned this issue Jun 9, 2021

Read over BOM #277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support UTF-8 with BOM #233

Support UTF-8 with BOM #233

aegoroff commented Oct 16, 2018

aaaasmile commented Nov 9, 2018

arp242 commented Jun 8, 2021

BurntSushi commented Jun 8, 2021

arp242 commented Jun 8, 2021

arp242 commented Jun 9, 2021

Support UTF-8 with BOM #233

Support UTF-8 with BOM #233

Comments

aegoroff commented Oct 16, 2018

aaaasmile commented Nov 9, 2018

arp242 commented Jun 8, 2021

BurntSushi commented Jun 8, 2021

arp242 commented Jun 8, 2021

arp242 commented Jun 9, 2021