Closed
Description
It wasn't obvious to me the resulting string for APIs like jerry_string_to_char_buffer()
isn't actually UTF-8 encoded, but CESU-8. (FWIW, I understand and agree with the rationale behind using CESU-8 vs UTF-8 (as per the discussion in #616).)
It also seems like in the implementation itself, functions are sometimes labeled with ..._utf8_...
, but in reality use CESU-8 (for example ecma_string_copy_to_utf8_buffer()
).
I would like to suggest 3 things:
- Rename all symbols (functions names, types, etc) that "falsely advertise" that they are dealing with UTF-8. For people that are new to the project and don't know about the recent transition from using UTF-8 to CESU-8 internally, this is very confusing.
- Clarify in the docstrings with each API that can hand back/copy out a string to the client of JerryScript, is CESU-8 encoded, and for APIs that use strings as input, specify that they need to be CESU-encoded.
- Over time, add APIs/wrappers for converting between CESU-8 <=> UTF-8.